Re: Easiest way to export the entire index

2020-01-31 Thread Amanda Shuman
Thanks all!

I wasn't familiar with using curl at the command line at all, but I did try
a basic curl yesterday based on this thread, admin console attribute
syntax, and the tutorial in solr documentation (
https://lucene.apache.org/solr/guide/8_4/solr-tutorial.html) and was able
to produce the file. I basically did what Steve Ge suggested, the command
looks kind of like this for anyone else who needs it in the future:

curl "
http://servername.com:8983/solr/collection1/select?indent=on=*:*=5000=json;
> collection1_index.json

I just set the rows to the number in our index, which I got from the admin
console.

Amanda

------
Dr. Amanda Shuman
Researcher and Lecturer, Institute of Chinese Studies, University of
Freiburg
Coordinator for the MA program in Modern China Studies
Database Administrator, The Maoist Legacy <https://maoistlegacy.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 96748



On Wed, Jan 29, 2020 at 4:21 PM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Amanda,
> I assume that you have all the fields stored so you will be able to export
> full document.
>
> Several thousands records should not be too much to use regular start+rows
> to paginate results, but the proper way of doing that would be to use
> cursors. Adjust page size to avoid creating huge responses and you can use
> curl or some similar tool to avoid using admin console. I did a quick
> search and there are several blog posts with scripts that does what you
> need.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2020, at 15:43, Amanda Shuman  wrote:
> >
> > Dear all:
> >
> > I've been asked to produce a JSON file of our index so it can be combined
> > and indexed with other records. (We run solr 5.3.1 on this project; we're
> > not going to upgrade, in part because funding has ended.) The index has
> > several thousand rows, but nothing too drastic. Unfortunately, this is
> too
> > much to handle for a simple query dump from the admin console. I tried to
> > follow instructions related to running /export directly but I guess the
> > export handler isn't installed. I tried to divide the query into rows,
> but
> > after a certain amount it freezes, and it also freezes when I try to
> limit
> > rows (e.g., rows 501-551 freezes the console). Is there any other way to
> > export the index short of having to install the export handler
> considering
> > we're not working on this project anyone?
> >
> > Thanks,
> > Amanda
> >
> > --
> > Dr. Amanda Shuman
> > Researcher and Lecturer, Institute of Chinese Studies, University of
> > Freiburg
> > Coordinator for the MA program in Modern China Studies
> > Database Administrator, The Maoist Legacy <https://maoistlegacy.de/>
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 96748
>
>


Easiest way to export the entire index

2020-01-29 Thread Amanda Shuman
Dear all:

I've been asked to produce a JSON file of our index so it can be combined
and indexed with other records. (We run solr 5.3.1 on this project; we're
not going to upgrade, in part because funding has ended.) The index has
several thousand rows, but nothing too drastic. Unfortunately, this is too
much to handle for a simple query dump from the admin console. I tried to
follow instructions related to running /export directly but I guess the
export handler isn't installed. I tried to divide the query into rows, but
after a certain amount it freezes, and it also freezes when I try to limit
rows (e.g., rows 501-551 freezes the console). Is there any other way to
export the index short of having to install the export handler considering
we're not working on this project anyone?

Thanks,
Amanda

--
Dr. Amanda Shuman
Researcher and Lecturer, Institute of Chinese Studies, University of
Freiburg
Coordinator for the MA program in Modern China Studies
Database Administrator, The Maoist Legacy <https://maoistlegacy.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 96748


Re: Securying ONLY the web interface console

2018-10-25 Thread Amanda Shuman
Thanks - but I think I'm past those steps now. I set up an nginx reverse
proxy through the plesk panel initially, so that is fine. Binding it to
port 8983 seems to be the issue. Anyways, I think I'll try out the
instructions listed here and cross my fingers..:

https://talk.plesk.com/threads/unable-to-forward-requests-from-nginx-apache.347141/

Amanda
--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925



On Mon, Oct 22, 2018 at 5:35 PM Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> I think that it is not really Solr's job to solve this.   I'm sure that
> there are many Java ways to solve this  with Jetty configuration of JAAS,
> but the *safest* ways involve ports and rights.   In other words, port 8983
> and zookeeper ports are then for Solr nodes to communicate with each
> other.   But a web proxy on some other port (443 with https suggested)
> forwards /solr to port 8983.
>
> You can use many, many servers as the proxy server - Apache httpd and
> NGINX probably being the biggest contenders.   Because my systems team
> understands Apache httpd better, I use the following Apache httpd
> configuration file (this is actually the template version so I don't share
> more):
>
> CASLoginURL  https://{{httpd.cas.server}}/cas/login
> CASValidateURL   https://{{httpd.cas.server}}/cas/serviceValidate
> CASRootProxiedAs https://{{httpd.local.name}}
> CASCookiePath/var/cache/mod_auth_cas/
>
> RewriteEngine On
> RewriteLogLevel 0
> RewriteRule ^/$ https://%{HTTP_HOST}/solr/ [R=301,L]
>
> 
>   ProxyPass http://127.0.0.1:8983/solr retry=0
>   ProxyPassReverse http://127.0.0.1:8983/solr
>   AuthName "NLM Login"
>   AuthType CAS
>   CASScope /
>   CASAuthNHeader REMOTE_USER
>
>   Require user {{solr.admin.users}}
> 
> Now the Apache httpd directives for CAS are all part of the mod_auth_cas
> module, https://github.com/apereo/mod_auth_cas
>
> Other folks are using OAuth, SAML, or just basic htpasswd protection.
>
> Since you are a PhD candidate, I want to point you towards like Apache the
> definitive guide, rather than towards google which will help you from here
> anyway if you look for "Apache httpd web proxy tutorial' or "NGINX web
> proxy tutorial".   Anyway, here are the full docs for Apache httpd and
> links to the book I mention:
>
> * http://httpd.apache.org/docs/2.4/
> *
> https://www.amazon.com/Apache-Definitive-Guide-Ben-Laurie/dp/0596002033/ref=sr_1_1
> *
> https://www.safaribooksonline.com/library/view/apache-the-definitive/0596002033/
>
> > -Original Message-
> > From: Amanda Shuman 
> > Sent: Monday, October 22, 2018 9:55 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Securying ONLY the web interface console
> >
> > Just a follow-up to say that I never have resolved this issue
> > satisfactorily.
> >
> > --
> > Dr. Amanda Shuman
> > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > <http://www.maoistlegacy.uni-freiburg.de/>
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 4925
> >
> >
> >
> > On Mon, Jun 18, 2018 at 6:00 PM Amanda Shuman
> > 
> > wrote:
> >
> > > Hi Shawn et al,
> > >
> > > As a follow-up to this - then how would you solve the issue? I tried to
> > > use the instructions to set up basic authentication in solr (per a
> Stack
> > > Overflow post) and it worked to secure things, but the web app couldn't
> > > access solr. Tampering with the app code - which is the solr plug-in
> used
> > > for Omeka (https://github.com/scholarslab/SolrSearch) - would require
> a
> > > lot of extra work, so I'm wondering if there's a simpler solution. One
> of
> > > the developers on that told me to do a reverse proxy like the second
> > poster
> > > on this chain more or less suggests. But from what I understand of what
> > you
> > > wrote, this is not ideal because it only protects the admin UI panel
> and
> > > not everything else. So how then should I secure everything with the
> > > exception of calls coming from this web app?
> > >
> > > Best,
> > > Amanda
> > >
> > >
> > > --
> > > Dr. Amanda Shuman
> > > Post-doc researcher, University of Freiburg, The Maoist Legacy Projec

Re: Securying ONLY the web interface console

2018-10-22 Thread Amanda Shuman
Just a follow-up to say that I never have resolved this issue
satisfactorily.

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925



On Mon, Jun 18, 2018 at 6:00 PM Amanda Shuman 
wrote:

> Hi Shawn et al,
>
> As a follow-up to this - then how would you solve the issue? I tried to
> use the instructions to set up basic authentication in solr (per a Stack
> Overflow post) and it worked to secure things, but the web app couldn't
> access solr. Tampering with the app code - which is the solr plug-in used
> for Omeka (https://github.com/scholarslab/SolrSearch) - would require a
> lot of extra work, so I'm wondering if there's a simpler solution. One of
> the developers on that told me to do a reverse proxy like the second poster
> on this chain more or less suggests. But from what I understand of what you
> wrote, this is not ideal because it only protects the admin UI panel and
> not everything else. So how then should I secure everything with the
> exception of calls coming from this web app?
>
> Best,
> Amanda
>
>
> --
> Dr. Amanda Shuman
> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> <http://www.maoistlegacy.uni-freiburg.de/>
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 4925
>
>
> On Mon, Mar 19, 2018 at 11:03 PM, Shawn Heisey 
> wrote:
>
>> On 3/19/2018 11:19 AM, Jesus Olivan wrote:
>> > i'm trying to password protect only Solr web interface (not queries
>> > launched from my app). I'm currently using SolrCloud 6.6.0 with external
>> > zookeepers. I've read tons of Docs about it, but i couldn't find a
>> proper
>> > way to secure ONLY the web admin console. Can anybody give me some light
>> > about it, please? =)
>>
>> When you add authentication, it's not actually the admin UI that needs
>> authentication.  It's all the API requests (queries and the like) that
>> the admin UI makes which require authentication.
>>
>> The admin UI itself is completely static HTML, CSS, Javascript, and
>> images -- it doesn't have ANY information about your installation.
>> Requiring authentication for that doesn't make any sense at all --
>> there's nothing sensitive in those files.
>>
>> When you access the admin UI, the UI pieces are downloaded to your
>> browser, and then the UI actually runs in your browser, accessing the
>> API endpoints.  When the UI running in your browser first accesses one
>> of those endpoints, you get the authentication prompt.
>>
>> If we only secured the admin UI and not the API, then somebody who has
>> direct access to your Solr server could do whatever they wanted.  The
>> admin UI is just a convenience.  Everything it does can be done directly.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Question regarding searching Chinese characters

2018-07-24 Thread Amanda Shuman
Hi Tomoko,

Thanks so much for this explanation - I did not even know this was
possible! I will try it out but I have one question: do all I need to do is
modify the settings from smartChinese to the ones you posted here:


  
  
  


Or do I need to still do something with the SmartChineseAnalyzer? I did not
quite understand this in your first message:

" I think you need two steps if you want to use HMMChineseTokenizer
correctly.

1. transform all traditional characters to simplified ones and save to
temporary files.
I do not have clear idea for doing this, but you can create a Java
program that calls Lucene's ICUTransformFilter
2. then, index to Solr using SmartChineseAnalyzer."

My understanding is that with the new settings you posted, I don't need to
do these steps. Is that correct? Otherwise, I don't really know how to do
step 1 with the java program

Thanks!
Amanda


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Fri, Jul 20, 2018 at 8:03 PM, Tomoko Uchida  wrote:

> Yes, while traditional - simplified transformation would be out of the
> scope of Unicode normalization,
> you would like to add ICUNormalizer2CharFilterFactory anyway :)
>
> Let me refine my example settings:
>
> 
>   
>   
>id="Traditional-Simplified"/>
> 
>
> Regards,
> Tomoko
>
>
> 2018年7月21日(土) 2:54 Alexandre Rafalovitch :
>
> > Would  ICUNormalizer2CharFilterFactory do? Or at least serve as a
> > template of what needs to be done.
> >
> > Regards,
> >Alex.
> >
> > On 20 July 2018 at 12:40, Walter Underwood 
> wrote:
> > > Looks like we need a charfilter version of the ICU transforms. That
> > could run before the tokenizer.
> > >
> > > I’ve never built a charfilter, but it seems like this would be a good
> > first project for someone who wants to contribute.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >> On Jul 20, 2018, at 8:24 AM, Tomoko Uchida <
> > tomoko.uchida.1...@gmail.com> wrote:
> > >>
> > >> Exactly. More concretely, the starting point is: replacing your
> analyzer
> > >>
> > >>  > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"/>
> > >>
> > >> to
> > >>
> > >> 
> > >>  
> > >>   > >> id="Traditional-Simplified"/>
> > >> 
> > >>
> > >> and see if the results are as expected. Then research another filters
> if
> > >> your requirements is not met.
> > >>
> > >> Just a reminder: HMMChineseTokenizerFactory do not handle traditional
> > >> characters as I noted previous in post, so ICUTransformFilterFactory
> is
> > an
> > >> incomplete workaround.
> > >>
> > >> 2018年7月21日(土) 0:05 Walter Underwood :
> > >>
> > >>> I expect that this is the line that does the transformation:
> > >>>
> > >>>> >>> id="Traditional-Simplified"/>
> > >>>
> > >>> This mapping is a standard feature of ICU. More info on ICU
> transforms
> > is
> > >>> in this doc, though not much detail on this particular transform.
> > >>>
> > >>> http://userguide.icu-project.org/transforms/general
> > >>>
> > >>> wunder
> > >>> Walter Underwood
> > >>> wun...@wunderwood.org
> > >>> http://observer.wunderwood.org/  (my blog)
> > >>>
> > >>>> On Jul 20, 2018, at 7:43 AM, Susheel Kumar 
> > >>> wrote:
> > >>>>
> > >>>> I think so.  I used the exact as in github
> > >>>>
> > >>>>  > >>>> positionIncrementGap="1" autoGeneratePhraseQueries="false">
> > >>>> 
> > >>>>   
> > >>>>   
> > >>>>> class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
> > >>>>> >>> id="Traditional-Simplified"/>
> > >>>>> >>> id="Katakana-Hiragana"/>
> > >>>>   
> > >>>>> >>>>

Re: Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Thanks! That does indeed look promising... This can be added on top of
Smart Chinese, right? Or is it an alternative?


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Fri, Jul 20, 2018 at 3:11 PM, Susheel Kumar 
wrote:

> I think CJKFoldingFilter will work for you.  I put 舊小說 in index and then
> each of A, B or C or D in query and they seems to be matching and CJKFF is
> transforming the 舊 to 旧
>
> On Fri, Jul 20, 2018 at 9:08 AM, Susheel Kumar 
> wrote:
>
> > Lack of my chinese language knowledge but if you want, I can do quick
> test
> > for you in Analysis tab if you can give me what to put in index and query
> > window...
> >
> > On Fri, Jul 20, 2018 at 8:59 AM, Susheel Kumar 
> > wrote:
> >
> >> Have you tried to use CJKFoldingFilter https://g
> >> ithub.com/sul-dlss/CJKFoldingFilter.  I am not sure if this would cover
> >> your use case but I am using this filter and so far no issues.
> >>
> >> Thnx
> >>
> >> On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman  >
> >> wrote:
> >>
> >>> Thanks, Alex - I have seen a few of those links but never considered
> >>> transliteration! We use lucene's Smart Chinese analyzer. The issue is
> >>> basically what is laid out in the old blogspot post, namely this point:
> >>>
> >>>
> >>> "Why approach CJK resource discovery differently?
> >>>
> >>> 2.  Search results must be as script agnostic as possible.
> >>>
> >>> There is more than one way to write each word. "Simplified" characters
> >>> were
> >>> emphasized for printed materials in mainland China starting in the
> 1950s;
> >>> "Traditional" characters were used in printed materials prior to the
> >>> 1950s,
> >>> and are still used in Taiwan, Hong Kong and Macau today.
> >>> Since the characters are distinct, it's as if Chinese materials are
> >>> written
> >>> in two scripts.
> >>> Another way to think about it:  every written Chinese word has at least
> >>> two
> >>> completely different spellings.  And it can be mix-n-match:  a word can
> >>> be
> >>> written with one traditional  and one simplified character.
> >>> Example:   Given a user query 舊小說  (traditional for old fiction), the
> >>> results should include matches for 舊小說 (traditional) and 旧小说
> (simplified
> >>> characters for old fiction)"
> >>>
> >>> So, using the example provided above, we are dealing with materials
> >>> produced in the 1950s-1970s that do even weirder things like:
> >>>
> >>> A. 舊小說
> >>>
> >>> can also be
> >>>
> >>> B. 旧小说 (all simplified)
> >>> or
> >>> C. 旧小說 (first character simplified, last character traditional)
> >>> or
> >>> D. 舊小 说 (first character traditional, last character simplified)
> >>>
> >>> Thankfully the middle character was never simplified in recent times.
> >>>
> >>> From a historical standpoint, the mixed nature of the characters in the
> >>> same word/phrase is because not all simplified characters were adopted
> at
> >>> the same time by everyone uniformly (good times...).
> >>>
> >>> The problem seems to be that Solr can easily handle A or B above, but
> >>> NOT C
> >>> or D using the Smart Chinese analyzer. I'm not really sure how to
> change
> >>> that at this point... maybe I should figure out how to contact the
> >>> creators
> >>> of the analyzer and ask them?
> >>>
> >>> Amanda
> >>>
> >>> --
> >>> Dr. Amanda Shuman
> >>> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> >>> <http://www.maoistlegacy.uni-freiburg.de/>
> >>> PhD, University of California, Santa Cruz
> >>> http://www.amandashuman.net/
> >>> http://www.prchistoryresources.org/
> >>> Office: +49 (0) 761 203 4925
> >>>
> >>>
> >>> On Fri, Jul 20, 2018 at 1:40 PM, Alexandre Rafalovitch <
> >>> arafa...@gmail.com>
> >>> wrote:
> >>>
> >>> > This i

Re: Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Thanks, Alex - I have seen a few of those links but never considered
transliteration! We use lucene's Smart Chinese analyzer. The issue is
basically what is laid out in the old blogspot post, namely this point:


"Why approach CJK resource discovery differently?

2.  Search results must be as script agnostic as possible.

There is more than one way to write each word. "Simplified" characters were
emphasized for printed materials in mainland China starting in the 1950s;
"Traditional" characters were used in printed materials prior to the 1950s,
and are still used in Taiwan, Hong Kong and Macau today.
Since the characters are distinct, it's as if Chinese materials are written
in two scripts.
Another way to think about it:  every written Chinese word has at least two
completely different spellings.  And it can be mix-n-match:  a word can be
written with one traditional  and one simplified character.
Example:   Given a user query 舊小說  (traditional for old fiction), the
results should include matches for 舊小說 (traditional) and 旧小说 (simplified
characters for old fiction)"

So, using the example provided above, we are dealing with materials
produced in the 1950s-1970s that do even weirder things like:

A. 舊小說

can also be

B. 旧小说 (all simplified)
or
C. 旧小說 (first character simplified, last character traditional)
or
D. 舊小 说 (first character traditional, last character simplified)

Thankfully the middle character was never simplified in recent times.

>From a historical standpoint, the mixed nature of the characters in the
same word/phrase is because not all simplified characters were adopted at
the same time by everyone uniformly (good times...).

The problem seems to be that Solr can easily handle A or B above, but NOT C
or D using the Smart Chinese analyzer. I'm not really sure how to change
that at this point... maybe I should figure out how to contact the creators
of the analyzer and ask them?

Amanda

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Fri, Jul 20, 2018 at 1:40 PM, Alexandre Rafalovitch 
wrote:

> This is probably your start, if not read already:
> https://lucene.apache.org/solr/guide/7_4/language-analysis.html
>
> Otherwise, I think your answer would be somewhere around using ICU4J,
> IBM's library for dealing with Unicode: http://site.icu-project.org/
> (mentioned on the same page above)
> Specifically, transformations:
> http://userguide.icu-project.org/transforms/general
>
> With that, maybe you map both alphabets into latin. I did that once
> for Thai for a demo:
> https://github.com/arafalov/solr-thai-test/blob/master/
> collection1/conf/schema.xml#L34
>
> The challenge is to figure out all the magic rules for that. You'd
> have to dig through the ICU documentation and other web pages. I found
> this one for example:
> http://avajava.com/tutorials/lessons/what-are-the-system-
> transliterators-available-with-icu4j.html;jsessionid=
> BEAB0AF05A588B97B8A2393054D908C0
>
> There is also 12 part series on Solr and Asian text processing, though
> it is a bit old now: http://discovery-grindstone.blogspot.com/
>
> Hope one of these things help.
>
> Regards,
>Alex.
>
>
> On 20 July 2018 at 03:54, Amanda Shuman  wrote:
> > Hi all,
> >
> > We have a problem. Some of our historical documents have mixed together
> > simplified and Chinese characters. There seems to be no problem when
> > searching either traditional or simplified separately - that is, if a
> > particular string/phrase is all in traditional or simplified, it finds
> it -
> > but it does not find the string/phrase if the two different characters
> (one
> > traditional, one simplified) are mixed together in the SAME
> string/phrase.
> >
> > Has anyone ever handled this problem before? I know some libraries seem
> to
> > have implemented something that seems to be able to handle this, but I'm
> > not sure how they did so!
> >
> > Amanda
> > --
> > Dr. Amanda Shuman
> > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > <http://www.maoistlegacy.uni-freiburg.de/>
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 4925
>


Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Hi all,

We have a problem. Some of our historical documents have mixed together
simplified and Chinese characters. There seems to be no problem when
searching either traditional or simplified separately - that is, if a
particular string/phrase is all in traditional or simplified, it finds it -
but it does not find the string/phrase if the two different characters (one
traditional, one simplified) are mixed together in the SAME string/phrase.

Has anyone ever handled this problem before? I know some libraries seem to
have implemented something that seems to be able to handle this, but I'm
not sure how they did so!

Amanda
--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


Re: Securying ONLY the web interface console

2018-06-18 Thread Amanda Shuman
Hi Shawn et al,

As a follow-up to this - then how would you solve the issue? I tried to use
the instructions to set up basic authentication in solr (per a Stack
Overflow post) and it worked to secure things, but the web app couldn't
access solr. Tampering with the app code - which is the solr plug-in used
for Omeka (https://github.com/scholarslab/SolrSearch) - would require a lot
of extra work, so I'm wondering if there's a simpler solution. One of the
developers on that told me to do a reverse proxy like the second poster on
this chain more or less suggests. But from what I understand of what you
wrote, this is not ideal because it only protects the admin UI panel and
not everything else. So how then should I secure everything with the
exception of calls coming from this web app?

Best,
Amanda


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Mon, Mar 19, 2018 at 11:03 PM, Shawn Heisey  wrote:

> On 3/19/2018 11:19 AM, Jesus Olivan wrote:
> > i'm trying to password protect only Solr web interface (not queries
> > launched from my app). I'm currently using SolrCloud 6.6.0 with external
> > zookeepers. I've read tons of Docs about it, but i couldn't find a proper
> > way to secure ONLY the web admin console. Can anybody give me some light
> > about it, please? =)
>
> When you add authentication, it's not actually the admin UI that needs
> authentication.  It's all the API requests (queries and the like) that
> the admin UI makes which require authentication.
>
> The admin UI itself is completely static HTML, CSS, Javascript, and
> images -- it doesn't have ANY information about your installation.
> Requiring authentication for that doesn't make any sense at all --
> there's nothing sensitive in those files.
>
> When you access the admin UI, the UI pieces are downloaded to your
> browser, and then the UI actually runs in your browser, accessing the
> API endpoints.  When the UI running in your browser first accesses one
> of those endpoints, you get the authentication prompt.
>
> If we only secured the admin UI and not the API, then somebody who has
> direct access to your Solr server could do whatever they wanted.  The
> admin UI is just a convenience.  Everything it does can be done directly.
>
> Thanks,
> Shawn
>
>


Re: Delete then re-add a core

2018-06-11 Thread Amanda Shuman
Erick - thank you, the issue was the second one you mentioned -- I
completely forgot about changes that were made in conf files I never copied
over (including schema.xml). Once I overwrote and reloaded I had no
problems reindexing. I guess I had forgotten to check those since I was
copying various files back and forth... the errors in the solr log clearly
showed that what was happening.

Best,
Amanda


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Thu, Jun 7, 2018 at 5:39 PM, Erick Erickson 
wrote:

> Amanda:
>
> Your Solr log will record each update that comes through. It's a
> little opaque, by default it'll show you the first 10 IDs of each
> batch it receives.
>
> Guesses:
> - you're somehow having the same ID () assigned to multiple
> documents
> - your schemas are a bit different and the docs can't be indexed
> (undefined field for instance).
>
>
> Best,
> Erick
>
>
> On Thu, Jun 7, 2018 at 7:49 AM, Amanda Shuman 
> wrote:
> > Thanks, Shawn, that is a remarkably clear description.
> >
> > I am able to create the core and all appears fine, but when I go to
> index I
> > am unfortunately running into a new problem. I am indexing from the same
> > site content as before (it's just an Omeka install with a solr plug-in
> that
> > reindexes the sitE), but now it only indexes 3 (!) records out of 3000+
> and
> > then stops. I have no idea why. The old core - with a different name -
> > still works, even I choose to reindex it. Now I have to figure out which
> > error logs to check -- Solr or Omeka.
> >
> > Amanda
> >
> > --
> > Dr. Amanda Shuman
> > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > <http://www.maoistlegacy.uni-freiburg.de/>
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 4925
> >
> >
> > On Thu, Jun 7, 2018 at 3:08 PM, Shawn Heisey 
> wrote:
> >
> >> On 6/7/2018 4:12 AM, Amanda Shuman wrote:
> >>
> >>> Definitely not a permissions problem - everything is run by the solr
> user,
> >>> which owns everything in the directories. I just can't figure out why
> the
> >>> default working directory is in opt rather than var (which is where it
> >>> should be according to a previous chain I was in).
> >>>
> >>> But at this point I'm at a total loss, so maybe a fresh install
> wouldn't
> >>> hurt.
> >>>
> >>
> >> The "bin/solr" script, which is ultimately how Solr is started even when
> >> it is installed as a service, initially sets the current working
> directory
> >> to a directory that it knows as SOLR_TIP.  This is the directory
> containing
> >> bin, server, and others.  It defaults to /opt/solr when Solr is
> installed
> >> as a service.
> >>
> >> Then just before Solr is started, the script will change the current
> >> working directory to the server directory, which is a subdirectory of
> >> SOLR_TIP.
> >>
> >> So when Solr starts, the current working directory is $SOLR_TIP/server.
> >>
> >> The service installer sets the owner of everything in SOLR_TIP to root.
> >> The solr user has absolutely no reason to write to that directory at
> all.
> >> Everything that Solr writes will be to an absolute path under the "var
> dir"
> >> given during service install, which defaults to /var/solr.  THAT
> directory
> >> and all its contents will be owned by the user specified during install,
> >> which defaults to solr.
> >>
> >> The current working directory is where the developers want it, and will
> >> not be in the "var dir".  Its location is critical for correct Jetty
> >> operation.  When Solr is configured in the expected way for a service
> >> install, it does not use the current working directory.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Delete then re-add a core

2018-06-07 Thread Amanda Shuman
Thanks, Shawn, that is a remarkably clear description.

I am able to create the core and all appears fine, but when I go to index I
am unfortunately running into a new problem. I am indexing from the same
site content as before (it's just an Omeka install with a solr plug-in that
reindexes the sitE), but now it only indexes 3 (!) records out of 3000+ and
then stops. I have no idea why. The old core - with a different name -
still works, even I choose to reindex it. Now I have to figure out which
error logs to check -- Solr or Omeka.

Amanda

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Thu, Jun 7, 2018 at 3:08 PM, Shawn Heisey  wrote:

> On 6/7/2018 4:12 AM, Amanda Shuman wrote:
>
>> Definitely not a permissions problem - everything is run by the solr user,
>> which owns everything in the directories. I just can't figure out why the
>> default working directory is in opt rather than var (which is where it
>> should be according to a previous chain I was in).
>>
>> But at this point I'm at a total loss, so maybe a fresh install wouldn't
>> hurt.
>>
>
> The "bin/solr" script, which is ultimately how Solr is started even when
> it is installed as a service, initially sets the current working directory
> to a directory that it knows as SOLR_TIP.  This is the directory containing
> bin, server, and others.  It defaults to /opt/solr when Solr is installed
> as a service.
>
> Then just before Solr is started, the script will change the current
> working directory to the server directory, which is a subdirectory of
> SOLR_TIP.
>
> So when Solr starts, the current working directory is $SOLR_TIP/server.
>
> The service installer sets the owner of everything in SOLR_TIP to root.
> The solr user has absolutely no reason to write to that directory at all.
> Everything that Solr writes will be to an absolute path under the "var dir"
> given during service install, which defaults to /var/solr.  THAT directory
> and all its contents will be owned by the user specified during install,
> which defaults to solr.
>
> The current working directory is where the developers want it, and will
> not be in the "var dir".  Its location is critical for correct Jetty
> operation.  When Solr is configured in the expected way for a service
> install, it does not use the current working directory.
>
> Thanks,
> Shawn
>
>


Re: Delete then re-add a core

2018-06-07 Thread Amanda Shuman
Definitely not a permissions problem - everything is run by the solr user,
which owns everything in the directories. I just can't figure out why the
default working directory is in opt rather than var (which is where it
should be according to a previous chain I was in).

But at this point I'm at a total loss, so maybe a fresh install wouldn't
hurt.


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Wed, Jun 6, 2018 at 11:09 PM, BlackIce  wrote:

> One of the issues with the install script is that when its run by any user
> other than "solr" and installed into default directories,
> is that one might get ownership/permission problems.
>
> The easiest way to avoid these is by creating the "solr" user BEFORE
> installing Solr as a regular "Login-User",
> and then install Solr while being logged into this account (Or sudo, etc..)
> and then install Solr with NON default values for directories,
> have everything installed within the "Solr" users home directory space,
> that way everything belongs to the solr user, it is then easily modified,
> by just logging into the solr account and one doesn't have to worry
> about ownership/permissions.. ad if one makes a mistake it only affects
> the "solr" user...
>
> Ayway, just my 2 cents
>
> On Wed, Jun 6, 2018 at 9:41 PM, Amanda Shuman 
> wrote:
>
> > Thanks, I was able to do most of the but didn't reinstall... Still
> running
> > into an issue I think is related to current working directory. I guess
> > reinstalling might fix that?
> > Amanda
> >
> > On Wed, Jun 6, 2018, 17:27 Erick Erickson 
> wrote:
> >
> > > Assuming this is stand-alone:
> > > > find the data dir for the core (parent of the index dir)
> > > > find the config dir for the core
> > > > shut down Solr
> > > > "rm -rf data"
> > > > make any changes to the configs you want
> > > > start Solr
> > >
> > > As BlackIce said, reinstalling works too.
> > >
> > > If it's SolrCloud delete and recreate the collection, your configs
> > > will be in ZooKeeper. Of course update your configs with your changes
> > > before creating the new collection.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Wed, Jun 6, 2018 at 7:09 AM, BlackIce 
> wrote:
> > > > I'm not a Solr guru
> > > >
> > > > I take i that you installed Solr with the install script
> > > > then it installs into a dir where normal users have no right to
> access
> > > the
> > > > necessary files...
> > > >
> > > > One way to circumvent this is to un-install Solr and then re-install
> > > > without using the default and have it install into a directory where
> > the
> > > > solr and login user have access to.
> > > >
> > > > Deleting a Core is a simple as deleting its directory...
> > > >
> > > > Hope this helps - good luck
> > > >
> > > > On Wed, Jun 6, 2018 at 3:59 PM, Amanda Shuman <
> amanda.shu...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Oh, and I also have a related question - how can I change my CWD
> > > (current
> > > >> working directory)? It is set for the /opt/ folder and not /var/
> and I
> > > >> think that's screwing things up...
> > > >> Thanks!
> > > >> Amanda
> > > >>
> > > >> --
> > > >> Dr. Amanda Shuman
> > > >> Post-doc researcher, University of Freiburg, The Maoist Legacy
> Project
> > > >> <http://www.maoistlegacy.uni-freiburg.de/>
> > > >> PhD, University of California, Santa Cruz
> > > >> http://www.amandashuman.net/
> > > >> http://www.prchistoryresources.org/
> > > >> Office: +49 (0) 761 203 4925
> > > >>
> > > >>
> > > >> On Wed, Jun 6, 2018 at 3:35 PM, Amanda Shuman <
> > amanda.shu...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi all, I'm a bit of a newbie still but have clearly screwed
> > something
> > > >> > up... so I think what I need to do now is to delete a core (saving
> > > >> current
> > > >&

Re: Delete then re-add a core

2018-06-06 Thread Amanda Shuman
Thanks, I was able to do most of the but didn't reinstall... Still running
into an issue I think is related to current working directory. I guess
reinstalling might fix that?
Amanda

On Wed, Jun 6, 2018, 17:27 Erick Erickson  wrote:

> Assuming this is stand-alone:
> > find the data dir for the core (parent of the index dir)
> > find the config dir for the core
> > shut down Solr
> > "rm -rf data"
> > make any changes to the configs you want
> > start Solr
>
> As BlackIce said, reinstalling works too.
>
> If it's SolrCloud delete and recreate the collection, your configs
> will be in ZooKeeper. Of course update your configs with your changes
> before creating the new collection.
>
> Best,
> Erick
>
>
> On Wed, Jun 6, 2018 at 7:09 AM, BlackIce  wrote:
> > I'm not a Solr guru
> >
> > I take i that you installed Solr with the install script
> > then it installs into a dir where normal users have no right to access
> the
> > necessary files...
> >
> > One way to circumvent this is to un-install Solr and then re-install
> > without using the default and have it install into a directory where the
> > solr and login user have access to.
> >
> > Deleting a Core is a simple as deleting its directory...
> >
> > Hope this helps - good luck
> >
> > On Wed, Jun 6, 2018 at 3:59 PM, Amanda Shuman 
> > wrote:
> >
> >> Oh, and I also have a related question - how can I change my CWD
> (current
> >> working directory)? It is set for the /opt/ folder and not /var/ and I
> >> think that's screwing things up...
> >> Thanks!
> >> Amanda
> >>
> >> --
> >> Dr. Amanda Shuman
> >> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> >> <http://www.maoistlegacy.uni-freiburg.de/>
> >> PhD, University of California, Santa Cruz
> >> http://www.amandashuman.net/
> >> http://www.prchistoryresources.org/
> >> Office: +49 (0) 761 203 4925
> >>
> >>
> >> On Wed, Jun 6, 2018 at 3:35 PM, Amanda Shuman 
> >> wrote:
> >>
> >> > Hi all, I'm a bit of a newbie still but have clearly screwed something
> >> > up... so I think what I need to do now is to delete a core (saving
> >> current
> >> > conf files as-is) then re-add/re-create the core and re-index. (It's
> not
> >> a
> >> > big site and it's not public yet, so I'm not concerned about taking
> >> > anything down during this process.)
> >> >
> >> > So what's the quickest way to do this:
> >> >
> >> > 1. Create a new core at command line with different name, move all
> conf
> >> > files into that (?)
> >> > 2. Delete the current core at command line, but what's the script for
> >> > doing that to make sure it's totally gone? I see different responses
> >> > online... not sure what's the "best practice" for this...
> >> >
> >> > Thanks!
> >> > Amanda
> >> >
> >> >
> >> >
> >> > --
> >> > Dr. Amanda Shuman
> >> > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> >> > <http://www.maoistlegacy.uni-freiburg.de/>
> >> > PhD, University of California, Santa Cruz
> >> > http://www.amandashuman.net/
> >> > http://www.prchistoryresources.org/
> >> > Office: +49 (0) 761 203 4925
> >> >
> >> >
> >>
>


Re: Delete then re-add a core

2018-06-06 Thread Amanda Shuman
Oh, and I also have a related question - how can I change my CWD (current
working directory)? It is set for the /opt/ folder and not /var/ and I
think that's screwing things up...
Thanks!
Amanda

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Wed, Jun 6, 2018 at 3:35 PM, Amanda Shuman 
wrote:

> Hi all, I'm a bit of a newbie still but have clearly screwed something
> up... so I think what I need to do now is to delete a core (saving current
> conf files as-is) then re-add/re-create the core and re-index. (It's not a
> big site and it's not public yet, so I'm not concerned about taking
> anything down during this process.)
>
> So what's the quickest way to do this:
>
> 1. Create a new core at command line with different name, move all conf
> files into that (?)
> 2. Delete the current core at command line, but what's the script for
> doing that to make sure it's totally gone? I see different responses
> online... not sure what's the "best practice" for this...
>
> Thanks!
> Amanda
>
>
>
> --
> Dr. Amanda Shuman
> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> <http://www.maoistlegacy.uni-freiburg.de/>
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 4925
>
>


Delete then re-add a core

2018-06-06 Thread Amanda Shuman
Hi all, I'm a bit of a newbie still but have clearly screwed something
up... so I think what I need to do now is to delete a core (saving current
conf files as-is) then re-add/re-create the core and re-index. (It's not a
big site and it's not public yet, so I'm not concerned about taking
anything down during this process.)

So what's the quickest way to do this:

1. Create a new core at command line with different name, move all conf
files into that (?)
2. Delete the current core at command line, but what's the script for doing
that to make sure it's totally gone? I see different responses online...
not sure what's the "best practice" for this...

Thanks!
Amanda



--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


Re: How to get a solr core to persist

2017-11-20 Thread Amanda Shuman
Hi Shawn,

I did as you suggested and created the core by hand - I copied the files
from the existing core, including the index files (data directory) and
changed the core.properties file to the new core name (core_new) and
restarted. Now I'm having a different issue - it says it is Optimized but
that Current is not (the console shows the red prohibited sign, which I
guess means false or something?). So basically there's no content at all in
there. Re-reading your instructions here: " If you want to relocate the
data, you can add a dataDir property to core.properties.  If it has a
relative path, it is relative to the core.properties location." - Did I
miss a step to get the existing index to load?

Thanks!
Amanda

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Wed, Nov 15, 2017 at 1:32 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/15/2017 2:28 AM, Amanda Shuman wrote:
>
>> 1) so does this mean that on the back-end I should first create my new
>> core, e.g., core1 and then within that place a conf folder with all the
>> files? Same for the data folder? If so, is it fine to just use the
>> existing
>> config files that I've previously worked on (i.e. the config for search
>> that I already modified)? I presume this won't be an issue.
>>
>> 2) does it matter if I create this core through the admin console or at
>> command line?
>>
>
> You can create your cores however you like.  I actually create all my
> cores completely by hand, including the core.properties file, and let Solr
> discover them on startup.  Mostly I just copy an existing core, change
> core.properties to correct values, make any config changes I need, and
> restart Solr.
>
> If you want to use the admin UI (or the CoreAdmin API directly, which is
> what the admin UI calls), then the instanceDir must have a conf directory
> with all the config files you require for the core, and NOT have a
> core.properties file.  If you're adding a core that already has a an index,
> then you would also include the data directory in the core's instanceDir.
> If you want to relocate the data, you can add a dataDir property to
> core.properties.  If it has a relative path, it is relative to the
> core.properties location.
>
> The commandline creation works pretty well.  The way it works is by
> copying a configset (which may be in server/solr/configsets or in a custom
> location) to the "conf" directory in the core, then calling the CoreAdmin
> API to actually add the core to Solr (and create core.properties so it'll
> get picked up on restart).
>
> Thanks,
> Shawn
>


Re: How to get a solr core to persist

2017-11-15 Thread Amanda Shuman
Ah, also, this is what the admin console says for location of core docs
when I created the core at command line:

CWD:/opt/solr-5.3.1/serverInstance:/var/solr/data/[corename]Data:
/var/solr/data/[corename]/dataIndex:/var/solr/data/[corename]/data/index


--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Wed, Nov 15, 2017 at 10:28 AM, Amanda Shuman <amanda.shu...@gmail.com>
wrote:

> Hi Shawn,
>
> Thanks a million for your response! I really do appreciate it... this
> helps clarify how things should be set up.
>
> As for why things are set up the way they are and the webapps location...
> honestly I think my predecessor did not really understand solr at all...
> I'm trying to pick up the pieces now on the back-end. (On the bright side,
> I did figure out how to modify the search relevance critieria in the config
> files for our core, but I'm more of a front-end developer and that seemed a
> lot more intuitive to me.)
>
> It does seem that the solr home is currently in /var/solr/data (not
> server/solr) because when I created a new core at command line, that's
> where it went. We start/restart solr using /etc/init.d/ rather than
> bin/solr.
>
> If can ask a few very small follow-up questions to this:
>
> "If you do a manual core creation, the core.properties file must NOT
> exist, but the conf directory *must* exist with the proper contents."
>
> 1) so does this mean that on the back-end I should first create my new
> core, e.g., core1 and then within that place a conf folder with all the
> files? Same for the data folder? If so, is it fine to just use the existing
> config files that I've previously worked on (i.e. the config for search
> that I already modified)? I presume this won't be an issue.
>
> 2) does it matter if I create this core through the admin console or at
> command line?
>
> Thanks again!
> Amanda
>
> --
> Dr. Amanda Shuman
> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> <http://www.maoistlegacy.uni-freiburg.de/>
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 4925 <+49%20761%202034925>
>
>
> On Tue, Nov 14, 2017 at 3:15 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 11/14/2017 2:14 AM, Amanda Shuman wrote:
>>
>>> We have just one solr core, which resides in the webapp foIder (in solr
>>> 5.3.1 this is at /opt/solr-5.3.1/server/webapps/[corename]/  -- the data
>>> folder is in the same place at /data).
>>>
>>
>> Why is your core there?  That is not a typical location, and does explain
>> the restart behavior you're seeing.
>>
>> Usually core directories go in the solr home.  If you start solr using
>> "bin/solr start" directly without any options, the solr home will be in
>> server/solr, not server/webapp.  How are you starting Solr?
>>
>> "Could not create a new core in
>>> /opt/solr-5.3.1/server/webapps/[corename]/as another core is already
>>> defined there"
>>>
>>
>> When Solr starts, it begins searching the coreRootDirectory (which
>> defaults to the solr home) for cores.  When it locates a core.properties
>> file, that location becomes the instanceDir for a core.
>>
>> If you do a manual core creation, the core.properties file must NOT
>> exist, but the conf directory *must* exist with the proper contents. The
>> core creation will create that file.  If it already exists, then Solr will
>> refuse to create the core, just as you have seen.
>>
>> The program directory location you have mentioned (/opt/solr-5.3.1)
>> sounds like somebody did a service installation.  The default solr home
>> when you install the service (and start Solr with /etc/init.d/ rather
>> than bin/solr) is /var/solr/data.  This location can be overridden, but
>> that's the default.
>>
>> Instead of having your core in webapp, move it to the solr home, wherever
>> that is.  Then when you start Solr, it will find the core.
>>
>> If a service installation has been done, then you should not start Solr
>> with "bin/solr" -- you should start the installed service.
>>
>> Thanks,
>> Shawn
>>
>
>


Re: How to get a solr core to persist

2017-11-15 Thread Amanda Shuman
Hi Shawn,

Thanks a million for your response! I really do appreciate it... this helps
clarify how things should be set up.

As for why things are set up the way they are and the webapps location...
honestly I think my predecessor did not really understand solr at all...
I'm trying to pick up the pieces now on the back-end. (On the bright side,
I did figure out how to modify the search relevance critieria in the config
files for our core, but I'm more of a front-end developer and that seemed a
lot more intuitive to me.)

It does seem that the solr home is currently in /var/solr/data (not
server/solr) because when I created a new core at command line, that's
where it went. We start/restart solr using /etc/init.d/ rather than
bin/solr.

If can ask a few very small follow-up questions to this:

"If you do a manual core creation, the core.properties file must NOT exist,
but the conf directory *must* exist with the proper contents."

1) so does this mean that on the back-end I should first create my new
core, e.g., core1 and then within that place a conf folder with all the
files? Same for the data folder? If so, is it fine to just use the existing
config files that I've previously worked on (i.e. the config for search
that I already modified)? I presume this won't be an issue.

2) does it matter if I create this core through the admin console or at
command line?

Thanks again!
Amanda

--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project
<http://www.maoistlegacy.uni-freiburg.de/>
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925


On Tue, Nov 14, 2017 at 3:15 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/14/2017 2:14 AM, Amanda Shuman wrote:
>
>> We have just one solr core, which resides in the webapp foIder (in solr
>> 5.3.1 this is at /opt/solr-5.3.1/server/webapps/[corename]/  -- the data
>> folder is in the same place at /data).
>>
>
> Why is your core there?  That is not a typical location, and does explain
> the restart behavior you're seeing.
>
> Usually core directories go in the solr home.  If you start solr using
> "bin/solr start" directly without any options, the solr home will be in
> server/solr, not server/webapp.  How are you starting Solr?
>
> "Could not create a new core in
>> /opt/solr-5.3.1/server/webapps/[corename]/as another core is already
>> defined there"
>>
>
> When Solr starts, it begins searching the coreRootDirectory (which
> defaults to the solr home) for cores.  When it locates a core.properties
> file, that location becomes the instanceDir for a core.
>
> If you do a manual core creation, the core.properties file must NOT exist,
> but the conf directory *must* exist with the proper contents. The core
> creation will create that file.  If it already exists, then Solr will
> refuse to create the core, just as you have seen.
>
> The program directory location you have mentioned (/opt/solr-5.3.1) sounds
> like somebody did a service installation.  The default solr home when you
> install the service (and start Solr with /etc/init.d/ rather than
> bin/solr) is /var/solr/data.  This location can be overridden, but that's
> the default.
>
> Instead of having your core in webapp, move it to the solr home, wherever
> that is.  Then when you start Solr, it will find the core.
>
> If a service installation has been done, then you should not start Solr
> with "bin/solr" -- you should start the installed service.
>
> Thanks,
> Shawn
>


How to get a solr core to persist

2017-11-14 Thread Amanda Shuman
Hi all, I consider myself relatively new to server admin (it's something I
have to do on the side for the project I'm on and I inherited the current
solr setup from my predecessor), so please be kind.

The issue:

We have just one solr core, which resides in the webapp foIder (in solr
5.3.1 this is at /opt/solr-5.3.1/server/webapps/[corename]/  -- the data
folder is in the same place at /data). The core is used to index and search
content from just one webapp. The core "disappears" every time the server
restarts, making the webapp search break every time. By disappears I mean
that the folder and content is still in the webapps folder on the server,
but it needs to be remapped/re-linked via the solr admin console. The core
was and is added via the solr admin console, not at command line. Whenever
I add it back I get this message in solr or in the logs:

"Could not create a new core in
/opt/solr-5.3.1/server/webapps/[corename]/as another core is already
defined there"

Then I refresh the page and it's there again and everything works fine.

So my predecessor knew about this problem and never fixed it; he told me to
just re-add the core via admin console every time the server starts (not a
good solution).

I've done a lot of research on getting a core to persist and it probably
has to do with permissions, but where? All the folders in webapps are owned
by the solr user. If the issue is with the solr command at startup using
root, how can I change this?

I also added a new core at command line, which works in that it persists,
but the core is by default in var/solr folder. How can I create the core in
the webapps folder instead? Also, must I first delete everything in that
folder, including all the conf and data files, before creating the new
core? Will I need to re-build everything from scratch if I do this or can I
use the same conf files as before (I presume I can re-add them)?

Also, there's nothing wrong with the content or permissions of the core
properties files, it is owned by the solr user and contains one line with
the correct core name.

Thanks for your help,

Amanda