Re: TIKA OCR not working

2015-04-28 Thread trung.ht
Hi Uwe,

Today, I downloaded Solr 5.1 and it worked fine. It seems that this bug fix
SOLR-7139 is only included in 5.1, not 5.0.

Thank everyone for your support.

Trung.

On Tue, Apr 28, 2015 at 10:21 AM, trung.ht  wrote:

> Hi Uwe,
>
> Thanks for the answer, but it looks like it does not work on my machine.
>
> I use Mac OS 10.10.3, tesseract is installed through homebrew, and tested
> with the same file I post to solr.
> I think tesseract is on path since I run this command successfully: "tesseract
> test_tesseract.png output"
>
> On command line, I got correct result (output is the correct content of
> the image), but when I upload to solr, the content is only some new line
> characters. (I used
>
> About log file, I did not see anything abnormal in solr log file (nothing
> abnormal after my POST request), am I missing another log file?
>
> With best regards,
> Trung.
>
>
> On Mon, Apr 27, 2015 at 9:34 PM, Uwe Schindler  wrote:
>
>> Hi,
>> TIKA OCR is definitely working automatically with Solr 5.x.
>>
>> It is just important to install TesseractOCR on path (which is a native
>> tool that does the actual work). On Ubuntu Linux, this should be quite
>> simple ("apt-get install tesseract-ocr" or like that). You may also need to
>> ainstall additional language for better results.
>>
>> Unless you are on a Turkish localized machine (which causes a bug in the
>> JDK on spawning external processes) and the native tools are installed, it
>> should work OOB, no configuration needed. Please also check log files.
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>> > -Original Message-
>> > From: Allison, Timothy B. [mailto:talli...@mitre.org]
>> > Sent: Monday, April 27, 2015 4:27 PM
>> > To: u...@tika.apache.org
>> > Cc: trung...@anlab.vn; solr-user@lucene.apache.org
>> > Subject: FW: TIKA OCR not working
>> >
>> > Trung,
>> >
>> > I haven't experimented with our OCR parser yet, but this should give a
>> good
>> > start: https://wiki.apache.org/tika/TikaOCR .
>> >
>> > Have you installed tesseract?
>> >
>> > Tika colleagues,
>> >   Any other tips?  What else has to be configured and how?
>> >
>> > -Original Message-
>> > From: trung.ht [mailto:trung...@anlab.vn]
>> > Sent: Friday, April 24, 2015 11:22 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: TIKA OCR not working
>> >
>> > HI everyone,
>> >
>> > Does anyone have the answer for this problem :)?
>> >
>> >
>> > I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika
>> 1.7,
>> > > but it looks like it does not work. Does anyone know that TIKA OCR
>> > > works automatically with Solr or I have to change some settings?
>> > >
>> > >>
>> > Trung.
>> >
>> >
>> > > It's not clear if OCR would happen automatically in Solr Cell, or if
>> > >> changes to Solr would be needed.
>> > >>
>> > >> For Tika OCR info, see:
>> > >>
>> > >> https://issues.apache.org/jira/browse/TIKA-93
>> > >> https://wiki.apache.org/tika/TikaOCR
>> > >>
>> > >>
>> > >>
>> > >> -- Jack Krupansky
>> > >>
>> > >> On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch <
>> > >> arafa...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't
>> > >> > seen
>> > >> it
>> > >> > in use yet.
>> > >> >
>> > >> > Regards,
>> > >> > Alex
>> > >> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan" > >
>> > >> wrote:
>> > >> >
>> > >> > > Hi Trung,
>> > >> > >
>> > >> > > I didn't know about OCR capabilities of tika.
>> > >> > > Someone who is familiar with sold-cell can inform us whether this
>> > >> > > functionality is added to solr or not.
>> > >> > >
>> > >> > > Ahmet
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Thursday, April 23, 2015 2:06 PM, trung.ht > >
>> > >> wrote:
>> > >> > > Hi Ahmet,
>> > >> > >
>> > >> > > I used a png file, not a pdf file. From the document, I
>> > >> > > understand
>> > >> that
>> > >> > > solr will post the file to tika, and since tika 1.7, OCR is
>> included.
>> > >> Is
>> > >> > > there something I misunderstood.
>> > >> > >
>> > >> > > Trung.
>> > >> > >
>> > >> > >
>> > >> > > On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan
>> > >> > > >> > >
>> > >> > > wrote:
>> > >> > >
>> > >> > > > Hi Trung,
>> > >> > > >
>> > >> > > > solr-cell (tika) does not do OCR. It cannot exact text from
>> > >> > > > image
>> > >> based
>> > >> > > > pdfs.
>> > >> > > >
>> > >> > > > Ahmet
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > On Thursday, April 23, 2015 7:33 AM, trung.ht
>> > >> > > > 
>> > >> > wrote:
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > I want to use solr to index some scanned document, after
>> > >> > > > settings
>> > >> solr
>> > >> > > > document with a two field "content" and "filename", I tried to
>> > >> upload
>> > >> > the
>> > >> > > > attached file, but it seems that the content of the file is
>> > >> > > > only
>> > >> "\n

Re: Support of solr in Spark

2015-04-28 Thread Chris Hostetter

: I am thinking to index these companies name in solr since all the 
functionality already there?
: 
: Do we have support for spark?

https://github.com/LucidWorks/spark-solr


Also of possible interest...

http://lucidworks.com/blog/solr-yarn/
https://github.com/LucidWorks/yarn-proto
https://issues.apache.org/jira/browse/SOLR-6743


-Hoss
http://www.lucidworks.com/


Re: Odp.: solr issue with pdf forms

2015-04-28 Thread Erick Erickson
There better be.

1> go to the admin UI
2> select a core
3> select "schema browser"
4> select a field from the drop-down

Until you do step 4 the window will be pretty blank.

Here's the info for TermsComponent, what have you tried?

https://cwiki.apache.org/confluence/display/solr/The+Terms+Component

Best,
Erick

On Tue, Apr 28, 2015 at 1:04 PM,   wrote:
> Thanks a lot for being patient with me. Unfortunately there is no button 
> "load term info". :-(
> Can you may be help me using the TermsComponent instead? I read it is per 
> default configured.
>
> Thanks a lot
> Best
> Steve
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Montag, 27. April 2015 17:23
> An: solr-user@lucene.apache.org
> Betreff: Re: Odp.: solr issue with pdf forms
>
> We're still not quite there. There should be a "load term info" button on 
> that page. Clicking that button will show you the terms in your index (as 
> opposed to the raw stored input which is what you get when you look at 
> results in the browser). My bet is that you'll see perfectly normal tokens in 
> the index that will NOT have the wonky characters you see in the display.
>
> If that's the case, then you have a browser issue, Solr is working perfectly 
> fine. On the other hand, if the individual terms are weird, then you have 
> something more fundamental going on.
>
> Which is why I mentioned the TermsComponent. That will return indexed tokens, 
> and allows you a bit more flexibility than the admin page in terms of what 
> tokens you see, but it's essentially the same information.
>
> Best,
> Erick
>
> On Sun, Apr 26, 2015 at 11:18 PM,   wrote:
>> Erick,
>>
>> thanks a lot for helping me here. In my case it ist he "content" field which 
>> is displayed not correctly. So I went tot he schema browser like you pointed 
>> out. Here ist he information I found:
>> Field: content
>> Field Type: text
>> Properties:  Indexed, Tokenized, Stored, TermVector Stored
>> Schema:  Indexed, Tokenized, Stored, TermVector Stored
>> Index:  Indexed, Tokenized, Stored, TermVector Stored Copied Into:
>> spell teaser Position Increment Gap:  100 Index Analyzer:
>> org.apache.solr.analysis.TokenizerChain Details Tokenizer Class:
>> org.apache.solr.analysis.WhitespaceTokenizerFactory
>> Filters:
>> org.apache.solr.analysis.WordDelimiterFilterFactory
>> args:{preserveOriginal: 1 splitOnCaseChange: 0 generateNumberParts: 1
>> catenateWords: 1 luceneMatchVersion: LUCENE_36 generateWordParts: 1
>> catenateAll: 0 catenateNumbers: 1 }
>> org.apache.solr.analysis.LowerCaseFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
>> german/synonyms.txt expand: true ignoreCase: true luceneMatchVersion:
>> LUCENE_36 }
>> org.apache.solr.analysis.DictionaryCompoundWordTokenFilterFactory
>> args:{maxSubwordSize: 15 onlyLongestMatch: false minSubwordSize: 4
>> minWordSize: 5 dictionary: german/german-common-nouns.txt
>> luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.StopFilterFactory args:{words:
>> german/stopwords.txt ignoreCase: true enablePositionIncrements: true
>> luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.GermanNormalizationFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected:
>> german/protwords.txt language: German2 luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 } Query Analyzer:
>> org.apache.solr.analysis.TokenizerChain Details Tokenizer Class:
>> org.apache.solr.analysis.WhitespaceTokenizerFactory
>> Filters:
>> org.apache.solr.analysis.WordDelimiterFilterFactory
>> args:{preserveOriginal: 1 splitOnCaseChange: 0 generateNumberParts: 1
>> catenateWords: 0 luceneMatchVersion: LUCENE_36 generateWordParts: 1
>> catenateAll: 0 catenateNumbers: 0 }
>> org.apache.solr.analysis.LowerCaseFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.StopFilterFactory args:{words:
>> german/stopwords.txt ignoreCase: true enablePositionIncrements: true
>> luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.GermanNormalizationFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected:
>> german/protwords.txt language: German2 luceneMatchVersion: LUCENE_36 }
>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory
>> args:{luceneMatchVersion: LUCENE_36 }
>> Distinct:  160403
>>
>> Does this somehow help to figure out the issue?
>> Thanks
>> Best
>> Steve
>>
>>
>> -Ursprüngliche Nachricht-
>> Von: Erick Erickson [mailto:erickerick...@gmail.com]
>> Gesendet: Freitag, 24. April 2015 20:15
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Odp.: solr issue with pdf forms
>>
>> Steve:
>>
>> Right, it's not exactly obvious. Bring up the admin UI, something like 
>> http://localhost:8983/so

On the fly reloading of solr core properties

2015-04-28 Thread KNitin
Hi

 In Solrcloud (4.6.1) every time a property/value is changed in
solrcore.properties file, a core/collection reload is needed to pick up the
new values.

Core/Collection reloads for large collections (example 100 shards) is very
expensive (performance wise) and can pose a threat to the collection
stability (sometimes the reload fails since the timeout is only 60
seconds).For a RT serving infrastructure, this has the risk of potentially
bringing down the collection itself (or cause it to be unstable)

Would adding a Real Time config api (map) inside solrcloud help? Every solr
core can pick up the core specific configs from this Shared Map (which can
be changed on the fly). This can help with dynamically changing properties
without core reloads.

Is this a common use case and can be achieved without core reloads?

Kindly advise,

Nitin


Re: Solr VS Google Mini Search Appliance

2015-04-28 Thread Alexandre Rafalovitch
Facets! I believe Google Search Appliance does not support facets. Which
means it supports search, but not post-search results tuning. In general,
custom metadata was problematic with GSA

Cost. I know at least one - very large, international, company moving to
Solr from Google Search Appliance due to the cost. Especially the cost of
indexing multiple distributed large collections. Can't name them publicly,
but if you desperate, contact me in private.

However, I would probably recommend they move (or at least evaluate)
LucidWork's Fusion rather than Solr pure. They may actually find additional
commercial features worth their (expensive, often slow, corporate) time.

Regards,
   Alex.



Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 29 April 2015 at 01:07, Branko Simic 
wrote:

> Hi,
>
>
>
> We have a client that will have a website in Ektron CMS. As you may know
> Ektron has good integration with Solr and that is primary reason for us
> developers to use it.
>
> But the client does not want to switch from their Google Mini Search
> appliance even though Google stopped support for it 3 years ago.
>
> We need to make a good case for using Solr over Google Mini. I am
> collecting information all over internet but would also like to hear what
> you have to say as well.
>
> Thanks for taking time to read and hopefully answer my email.
>
>
>
> Regards,
>
>
>
> *Branko Simić*
>
> Lead Application Engineer
>
>
>
> [image: GRM]
>
>
>
> +44 (0)203 637 0128 | www.green-river-media.com | One St. Peter’s Road,
> Maidenhead, SL6 7QU
>
>
>
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/facebook_1.png]*
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/twitter_1.png]*
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/linkedin_1.png]*
> 
>
>
>


Re: How to improve the performance of query with expand query

2015-04-28 Thread Joel Bernstein
Could you provide a few more details?

1) Version of Lucene/Solr
2) A sample slow query
3) Number of unique values in the collapse field
4) Number of search results before the collapse
5) Number of results fetched in the page
6) Performance numbers for the query


Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 28, 2015 at 3:58 PM, yliu  wrote:

> Hi,
>
> I am using Solr to do some complex querying.  Some queries require me to
> return one main document and some expanded documents at the same time.
> When
> I run the query without extracting any expanded document, the performance
> is
> good.  But once I added the expanded conditions, the performance becomes
> really bad.  What is the best way to get a document and a few related
> documents (child documents) back at the same time?  Is there anything I can
> config or change in schema.xml file and solrconfig.xml file to help improve
> the performance?  Thanks,
>
> yliu
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-improve-the-performance-of-query-with-expand-query-tp4202895.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Load balancer for indexing?

2015-04-28 Thread Chris Hostetter

: I would still use ConcurrentUpdateSolrServer as it is good for catching up
: when my indexing has fallen behind.  I know it swallows exceptions.

I feel like you are missing the point of when/why 
ConcurrentUpdateSolrServer compared to your goal of "load balancing" 
updates.

The *only* feature ConcurrentUpdateSolrServer gives you over any other 
type of SolrServer is that it, internally, has a background thread which 
collects up and sends big blocks of documents to the server it points at 
behind the scenes.  sending big blocks of documents is the antithesis of 
your stated goal to "load balance" those updates to multiple servers.

If the reason you like using ConcurrentUpdateSolrServer is because of hte 
background thread, but you don't wnat the "big batches" you could just use 
wrap a regular HttpSolrServer (pointed at your load balancer) inside of a 
helper method that used an ExecutorServvice (or something else like it) to 
handle hte background thread yourself.

But again, because of how SolrCloud works, the *best* way to speed up 
performance of indexing is to minimize the amount of data going over the 
wire -- when you send documents to an *arbitrary* SolrCloud node, it will 
do the right thing, and forward those documents to the correct "leader" 
for the appropriate shard of hat document -- but if you use 
CloudSolrServer you can help eliminate one completely "hop" of that 
document in the network by letting your client talk directly to the 
*right* leader.

So even better then using HttpSolrServer with a load balancer (behind a 
multi-threaded executor if that's what you want) is using a 
CloudSolrServer -- it's like using a smart software load balancer that 
knows *exactly* which node to send the HTTP request to -- not because of 
CPU load, or current net connections, but because of where the data *must* 
ultimately be sent anyway.


-Hoss
http://www.lucidworks.com/


Re: Async deleteshard commands?

2015-04-28 Thread Anshum Gupta
Yes, that's because DELETEREPLICA doesn't support async at this time. It's
expected and documented.
The reason why it's not supported is because when ASYNC mode was
introduced, it was only added for tasks that could end up running longer
than the http timeout.
It might be a good thing to have for all calls, including the DELETEREPLICA.


On Tue, Apr 28, 2015 at 12:51 PM, Ian Rose  wrote:

> Sure.  Here is an example of ADDREPLICA in synchronous mode:
>
>
> http://localhost:8983/solr/admin/collections?action=addreplica&collection=293&shard=shard1_1
>
> response:
> 
> 
> 0
> 1168
> 
> 
> 
> 
> 0
> 1158
> 
> 293_shard1_1_replica2
> 
> 
> 
>
> And here is the same in asynchronous mode:
>
>
> http://localhost:8983/solr/admin/collections?action=addreplica&collection=293&shard=shard1_1&async=foo99
>
> response:
> 
> 
> 0
> 2
> 
> foo99
> 
>
> Note that the format of this response does NOT match the response format
> that I got from the attempt at an async DELETESHARD in my earlier email.
>
> Also note that I am now able to query for the status of this request:
>
>
> http://localhost:8983/solr/admin/collections?action=requeststatus&requestid=foo99
>
> response:
> 
> 
> 0
> 0
> 
> 
> completed
> found foo99 in completed tasks
> 
> 
>
>
>
> On Tue, Apr 28, 2015 at 2:06 PM, Anshum Gupta 
> wrote:
>
> > Hi Ian,
> >
> > What do you mean by "*my testing shows*" ? Can you elaborate on the steps
> > and how did you confirm that the call was indeed *async* ?
> > I may be wrong but I think what you're seeing is a normal DELETEREPLICA
> > call succeeding behind the scenes. It is not treated or processed as an
> > async call.
> >
> > Also, that page is the official reference guide and might need fixing if
> > it's out of sync.
> >
> >
> > On Tue, Apr 28, 2015 at 10:47 AM, Ian Rose 
> wrote:
> >
> > > Hi Anshum,
> > >
> > > FWIW I find that page is not entirely accurate with regard to async
> > > params.  For example, my testing shows that DELETEREPLICA *does*
> support
> > > the async param, although that is not listed here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9
> > >
> > > Cheers,
> > > Ian
> > >
> > >
> > > On Tue, Apr 28, 2015 at 12:47 PM, Anshum Gupta  >
> > > wrote:
> > >
> > > > Hi Ian,
> > > >
> > > > DELETESHARD doesn't support ASYNC calls officially. We could
> certainly
> > do
> > > > with a better response but I believe with most of the Collections API
> > > calls
> > > > at this time in Solr, you could send random params which would get
> > > ignored.
> > > > Therefore, in this case, I believe that the async param gets ignored.
> > > >
> > > > The go-to reference point to check what's supported is the official
> > > > reference guide:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
> > > >
> > > > This doesn't mentioned support for async DELETESHARD calls.
> > > >
> > > > On Tue, Apr 28, 2015 at 8:05 AM, Ian Rose 
> > wrote:
> > > >
> > > > > Is it possible to run DELETESHARD commands in async mode?  Google
> > > > searches
> > > > > seem to indicate yes, but not definitively.
> > > > >
> > > > > My local experience indicates otherwise.  If I start with an async
> > > > > SPLITSHARD like so:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
> > > > >
> > > > > Then I get back the expected response format, with  > > name="requestid">
> > > > > 12-foo-1
> > > > >
> > > > > And I can later query for the result via REQUESTSTATUS.
> > > > >
> > > > > However if I try an async DELETESHARD like so:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
> > > > >
> > > > > The response includes the command result, indicating that the
> command
> > > was
> > > > > not run async:
> > > > >
> > > > > 
> > > > > 
> > > > > 
> > > > > 0
> > > > > 16
> > > > > 
> > > > > 
> > > > > 
> > > > >
> > > > > And in addition REQUESTSTATUS calls for that requestId fail with
> "Did
> > > not
> > > > > find taskid [12-foo-4] in any tasks queue".
> > > > >
> > > > > Synchronous deletes are causing problems for me in production as
> they
> > > are
> > > > > timing out in some cases.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > >
> > > > > p.s. I'm on version 5.0.0
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Anshum Gupta
> > > >
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
>



-- 
Anshum Gupta


AW: Odp.: solr issue with pdf forms

2015-04-28 Thread Steve.Scholl
Thanks a lot for being patient with me. Unfortunately there is no button "load 
term info". :-(
Can you may be help me using the TermsComponent instead? I read it is per 
default configured.

Thanks a lot
Best
Steve

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Montag, 27. April 2015 17:23
An: solr-user@lucene.apache.org
Betreff: Re: Odp.: solr issue with pdf forms

We're still not quite there. There should be a "load term info" button on that 
page. Clicking that button will show you the terms in your index (as opposed to 
the raw stored input which is what you get when you look at results in the 
browser). My bet is that you'll see perfectly normal tokens in the index that 
will NOT have the wonky characters you see in the display.

If that's the case, then you have a browser issue, Solr is working perfectly 
fine. On the other hand, if the individual terms are weird, then you have 
something more fundamental going on.

Which is why I mentioned the TermsComponent. That will return indexed tokens, 
and allows you a bit more flexibility than the admin page in terms of what 
tokens you see, but it's essentially the same information.

Best,
Erick

On Sun, Apr 26, 2015 at 11:18 PM,   wrote:
> Erick,
>
> thanks a lot for helping me here. In my case it ist he "content" field which 
> is displayed not correctly. So I went tot he schema browser like you pointed 
> out. Here ist he information I found:
> Field: content
> Field Type: text
> Properties:  Indexed, Tokenized, Stored, TermVector Stored
> Schema:  Indexed, Tokenized, Stored, TermVector Stored
> Index:  Indexed, Tokenized, Stored, TermVector Stored Copied Into: 
> spell teaser Position Increment Gap:  100 Index Analyzer: 
> org.apache.solr.analysis.TokenizerChain Details Tokenizer Class:  
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> Filters:
> org.apache.solr.analysis.WordDelimiterFilterFactory 
> args:{preserveOriginal: 1 splitOnCaseChange: 0 generateNumberParts: 1 
> catenateWords: 1 luceneMatchVersion: LUCENE_36 generateWordParts: 1 
> catenateAll: 0 catenateNumbers: 1 } 
> org.apache.solr.analysis.LowerCaseFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: 
> german/synonyms.txt expand: true ignoreCase: true luceneMatchVersion: 
> LUCENE_36 } 
> org.apache.solr.analysis.DictionaryCompoundWordTokenFilterFactory 
> args:{maxSubwordSize: 15 onlyLongestMatch: false minSubwordSize: 4 
> minWordSize: 5 dictionary: german/german-common-nouns.txt 
> luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.StopFilterFactory args:{words: 
> german/stopwords.txt ignoreCase: true enablePositionIncrements: true 
> luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.GermanNormalizationFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: 
> german/protwords.txt language: German2 luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 } Query Analyzer: 
> org.apache.solr.analysis.TokenizerChain Details Tokenizer Class:  
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> Filters:
> org.apache.solr.analysis.WordDelimiterFilterFactory 
> args:{preserveOriginal: 1 splitOnCaseChange: 0 generateNumberParts: 1 
> catenateWords: 0 luceneMatchVersion: LUCENE_36 generateWordParts: 1 
> catenateAll: 0 catenateNumbers: 0 } 
> org.apache.solr.analysis.LowerCaseFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.StopFilterFactory args:{words: 
> german/stopwords.txt ignoreCase: true enablePositionIncrements: true 
> luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.GermanNormalizationFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: 
> german/protwords.txt language: German2 luceneMatchVersion: LUCENE_36 } 
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory 
> args:{luceneMatchVersion: LUCENE_36 }
> Distinct:  160403
>
> Does this somehow help to figure out the issue?
> Thanks
> Best
> Steve
>
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Freitag, 24. April 2015 20:15
> An: solr-user@lucene.apache.org
> Betreff: Re: Odp.: solr issue with pdf forms
>
> Steve:
>
> Right, it's not exactly obvious. Bring up the admin UI, something like 
> http://localhost:8983/solr. From there you have to select a core in the 'core 
> selector' drop-down on the left side. If you're using SolrCloud, this will 
> have a rather strange name, but it should be easy to identify what collection 
> it belongs to.
>
> At that point you'll see a bunch of new options, among them "schema browser". 
> From there, select your field from the drop-down that will appear, then a 
> button should pop up "load term info".
>
> NOTE: you c

How to improve the performance of query with expand query

2015-04-28 Thread yliu
Hi,

I am using Solr to do some complex querying.  Some queries require me to
return one main document and some expanded documents at the same time.  When
I run the query without extracting any expanded document, the performance is
good.  But once I added the expanded conditions, the performance becomes
really bad.  What is the best way to get a document and a few related
documents (child documents) back at the same time?  Is there anything I can
config or change in schema.xml file and solrconfig.xml file to help improve
the performance?  Thanks,

yliu



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-improve-the-performance-of-query-with-expand-query-tp4202895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Async deleteshard commands?

2015-04-28 Thread Ian Rose
Sure.  Here is an example of ADDREPLICA in synchronous mode:

http://localhost:8983/solr/admin/collections?action=addreplica&collection=293&shard=shard1_1

response:


0
1168




0
1158

293_shard1_1_replica2




And here is the same in asynchronous mode:

http://localhost:8983/solr/admin/collections?action=addreplica&collection=293&shard=shard1_1&async=foo99

response:


0
2

foo99


Note that the format of this response does NOT match the response format
that I got from the attempt at an async DELETESHARD in my earlier email.

Also note that I am now able to query for the status of this request:

http://localhost:8983/solr/admin/collections?action=requeststatus&requestid=foo99

response:


0
0


completed
found foo99 in completed tasks





On Tue, Apr 28, 2015 at 2:06 PM, Anshum Gupta 
wrote:

> Hi Ian,
>
> What do you mean by "*my testing shows*" ? Can you elaborate on the steps
> and how did you confirm that the call was indeed *async* ?
> I may be wrong but I think what you're seeing is a normal DELETEREPLICA
> call succeeding behind the scenes. It is not treated or processed as an
> async call.
>
> Also, that page is the official reference guide and might need fixing if
> it's out of sync.
>
>
> On Tue, Apr 28, 2015 at 10:47 AM, Ian Rose  wrote:
>
> > Hi Anshum,
> >
> > FWIW I find that page is not entirely accurate with regard to async
> > params.  For example, my testing shows that DELETEREPLICA *does* support
> > the async param, although that is not listed here:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9
> >
> > Cheers,
> > Ian
> >
> >
> > On Tue, Apr 28, 2015 at 12:47 PM, Anshum Gupta 
> > wrote:
> >
> > > Hi Ian,
> > >
> > > DELETESHARD doesn't support ASYNC calls officially. We could certainly
> do
> > > with a better response but I believe with most of the Collections API
> > calls
> > > at this time in Solr, you could send random params which would get
> > ignored.
> > > Therefore, in this case, I believe that the async param gets ignored.
> > >
> > > The go-to reference point to check what's supported is the official
> > > reference guide:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
> > >
> > > This doesn't mentioned support for async DELETESHARD calls.
> > >
> > > On Tue, Apr 28, 2015 at 8:05 AM, Ian Rose 
> wrote:
> > >
> > > > Is it possible to run DELETESHARD commands in async mode?  Google
> > > searches
> > > > seem to indicate yes, but not definitively.
> > > >
> > > > My local experience indicates otherwise.  If I start with an async
> > > > SPLITSHARD like so:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
> > > >
> > > > Then I get back the expected response format, with  > name="requestid">
> > > > 12-foo-1
> > > >
> > > > And I can later query for the result via REQUESTSTATUS.
> > > >
> > > > However if I try an async DELETESHARD like so:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
> > > >
> > > > The response includes the command result, indicating that the command
> > was
> > > > not run async:
> > > >
> > > > 
> > > > 
> > > > 
> > > > 0
> > > > 16
> > > > 
> > > > 
> > > > 
> > > >
> > > > And in addition REQUESTSTATUS calls for that requestId fail with "Did
> > not
> > > > find taskid [12-foo-4] in any tasks queue".
> > > >
> > > > Synchronous deletes are causing problems for me in production as they
> > are
> > > > timing out in some cases.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > > >
> > > > p.s. I'm on version 5.0.0
> > > >
> > >
> > >
> > >
> > > --
> > > Anshum Gupta
> > >
> >
>
>
>
> --
> Anshum Gupta
>


Re: Load balancer for indexing?

2015-04-28 Thread Shawn Heisey
On 4/28/2015 1:14 PM, spillane wrote:
> I see that CloudSolrServer appeared in SolrJ 4.5, will that work with a 4.2
> SolrCloud?  If so I'll upgrade my client and point the CloudSolrServer
> constructor at my 5 ZK hosts.
>
> I would still use ConcurrentUpdateSolrServer as it is good for catching up
> when my indexing has fallen behind.  I know it swallows exceptions.

CloudSolrServer first appeared in Solr 4.0.0.  Here's the 4.0.0 javadoc
for that class:

http://lucene.apache.org/solr/4_0_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html

A 4.5 SolrJ MIGHT work with a 4.2 SolrCloud, but I'm not sure I'm brave
enough to try it in production.  SolrCloud has evolved *very* quickly in
every single release since 4.0, when it became available, and I would
not be surprised to learn that the version combination you've mentioned
won't work.

The capability for CloudSolrServer to send updates to shard leaders did
not become available until SolrJ 4.5 ... but that's one of the big
reasons that I think it's a bad idea to try and use it with a 4.2
cluster.  I would not be surprised to learn that the capability requires
at least 4.5 on the *server* side as well.

As an FYI -- Solr 4.2.0 was released three years ago.  The number of
bugs fixed in that time is HUGE.  One particularly annoying bug ...
core/collection reloads in SolrCloud don't work right until version
4.4.  Before that, you have to completely restart Solr.

Thanks,
Shawn



Re: Load balancer for indexing?

2015-04-28 Thread spillane
Shawn / Hoss,

I see that CloudSolrServer appeared in SolrJ 4.5, will that work with a 4.2
SolrCloud?  If so I'll upgrade my client and point the CloudSolrServer
constructor at my 5 ZK hosts.

I would still use ConcurrentUpdateSolrServer as it is good for catching up
when my indexing has fallen behind.  I know it swallows exceptions.

S 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-balancer-for-indexing-tp4202707p4202882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Async deleteshard commands?

2015-04-28 Thread Anshum Gupta
Hi Ian,

What do you mean by "*my testing shows*" ? Can you elaborate on the steps
and how did you confirm that the call was indeed *async* ?
I may be wrong but I think what you're seeing is a normal DELETEREPLICA
call succeeding behind the scenes. It is not treated or processed as an
async call.

Also, that page is the official reference guide and might need fixing if
it's out of sync.


On Tue, Apr 28, 2015 at 10:47 AM, Ian Rose  wrote:

> Hi Anshum,
>
> FWIW I find that page is not entirely accurate with regard to async
> params.  For example, my testing shows that DELETEREPLICA *does* support
> the async param, although that is not listed here:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9
>
> Cheers,
> Ian
>
>
> On Tue, Apr 28, 2015 at 12:47 PM, Anshum Gupta 
> wrote:
>
> > Hi Ian,
> >
> > DELETESHARD doesn't support ASYNC calls officially. We could certainly do
> > with a better response but I believe with most of the Collections API
> calls
> > at this time in Solr, you could send random params which would get
> ignored.
> > Therefore, in this case, I believe that the async param gets ignored.
> >
> > The go-to reference point to check what's supported is the official
> > reference guide:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
> >
> > This doesn't mentioned support for async DELETESHARD calls.
> >
> > On Tue, Apr 28, 2015 at 8:05 AM, Ian Rose  wrote:
> >
> > > Is it possible to run DELETESHARD commands in async mode?  Google
> > searches
> > > seem to indicate yes, but not definitively.
> > >
> > > My local experience indicates otherwise.  If I start with an async
> > > SPLITSHARD like so:
> > >
> > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
> > >
> > > Then I get back the expected response format, with  name="requestid">
> > > 12-foo-1
> > >
> > > And I can later query for the result via REQUESTSTATUS.
> > >
> > > However if I try an async DELETESHARD like so:
> > >
> > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
> > >
> > > The response includes the command result, indicating that the command
> was
> > > not run async:
> > >
> > > 
> > > 
> > > 
> > > 0
> > > 16
> > > 
> > > 
> > > 
> > >
> > > And in addition REQUESTSTATUS calls for that requestId fail with "Did
> not
> > > find taskid [12-foo-4] in any tasks queue".
> > >
> > > Synchronous deletes are causing problems for me in production as they
> are
> > > timing out in some cases.
> > >
> > > Thanks,
> > > Ian
> > >
> > >
> > > p.s. I'm on version 5.0.0
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
>



-- 
Anshum Gupta


Re: New to SolrCloud

2015-04-28 Thread shacky
2015-04-28 19:45 GMT+02:00 Erick Erickson :

> I think you're over-thinking the problem though. How often does a
> machine fail? If it's more
> often than once in an blue moon, you have _other_ problems.

My needs are not only high availability (for which 2 nodes would be
enough), but also load balancing.
So I wish to have one shard replicated on all three nodes and
configure my load balancer (I already have one) to balance requests
between three nodes. It can recognise when the service goes down and
in this case it removes the failed node from the load balancing pool.


Re: Async deleteshard commands?

2015-04-28 Thread Ian Rose
Hi Anshum,

FWIW I find that page is not entirely accurate with regard to async
params.  For example, my testing shows that DELETEREPLICA *does* support
the async param, although that is not listed here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9

Cheers,
Ian


On Tue, Apr 28, 2015 at 12:47 PM, Anshum Gupta 
wrote:

> Hi Ian,
>
> DELETESHARD doesn't support ASYNC calls officially. We could certainly do
> with a better response but I believe with most of the Collections API calls
> at this time in Solr, you could send random params which would get ignored.
> Therefore, in this case, I believe that the async param gets ignored.
>
> The go-to reference point to check what's supported is the official
> reference guide:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
>
> This doesn't mentioned support for async DELETESHARD calls.
>
> On Tue, Apr 28, 2015 at 8:05 AM, Ian Rose  wrote:
>
> > Is it possible to run DELETESHARD commands in async mode?  Google
> searches
> > seem to indicate yes, but not definitively.
> >
> > My local experience indicates otherwise.  If I start with an async
> > SPLITSHARD like so:
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
> >
> > Then I get back the expected response format, with 
> > 12-foo-1
> >
> > And I can later query for the result via REQUESTSTATUS.
> >
> > However if I try an async DELETESHARD like so:
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
> >
> > The response includes the command result, indicating that the command was
> > not run async:
> >
> > 
> > 
> > 
> > 0
> > 16
> > 
> > 
> > 
> >
> > And in addition REQUESTSTATUS calls for that requestId fail with "Did not
> > find taskid [12-foo-4] in any tasks queue".
> >
> > Synchronous deletes are causing problems for me in production as they are
> > timing out in some cases.
> >
> > Thanks,
> > Ian
> >
> >
> > p.s. I'm on version 5.0.0
> >
>
>
>
> --
> Anshum Gupta
>


Re: Overseer role in solrCloud

2015-04-28 Thread Gopal Jee
Thanks a ton shalin. Now i have a very clear view of state change. Will
certainly help me stabilize my cluster issues.
Thanks a lot.

Gopal

On Tue, Apr 28, 2015 at 8:16 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Comments inline:
>
> On Tue, Apr 28, 2015 at 3:00 PM, Gopal Jee  wrote:
>
> > I am trying to understand the role of overseer and solrCloud stateChange
> > mechanism. I tried finding resources on web, but with not much luck.
> > Can someone point me to some relevant doc or explain. Few doubts i have:
> > 1. In doc, it says overseer updates clusterstate.json when a new node
> > joins. How does overseer node knows when a new node joins. Even overseer
> is
> > one independent node.
> >
>
> When a new node is loaded, it 'publishes' a 'state' message, for each local
> core that it loads, to the overseer queue. This message contains the node's
> base_url, state, core_name etc. This is how the overseer knows about a
> node.
>
>
> > 2. There is an  overseer queue znode in zookeeper. Do all solr servers
> > update its state in overseer queue? what type of events are published to
> > the queue? is this queue maintained inside zookeeper?
> >
>
> This queue is maintained inside ZooKeeper. See
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Queues
>
> All Solr servers publish a message to this queue when they change state.
> See the Overseer.processMessage method for the kind of messages supported
> at
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L330
>
>
>
> > 3. when a node goes down and loose connection with zookeeper, does
> > zookeeper updates its state in clusterstate.json or it lets overseer know
> > about lost connection and let it update clusterstate.json?
> >
>
> If a node shuts down gracefully then yes, it publishes a 'down' state
> message to the overseer queue and the overseer updates the cluster state.
> If a node is killed or crashes or looses connection with ZK for whatever
> reasons, then the ZooKeeper server waits until the ZK session expiry
> timeout to remove the node's corresponding entry from /live_nodes
> automatically.
>
>
> > 4. in docs it says that when a node is in down state, it can not cater to
> > read or write request. I have tried issuing a get request on one node
> which
> > is showing down in solr cloud panel and i did get  response ( with all
> > relevant documents). How is this happening? When a node goes to down
> state,
> > does it not block its request handlers or notify it not to cater to any
> get
> > requests?
> >
>
> When a replica goes into 'down' state then the other SolrCloud nodes as
> well as SolrJ clients will not route requests to that replica. Also, if the
> 'down' replica gets a request then it will forward the request to an
> 'active' replica automatically.
>
> No, it doesn't actively block requests if in 'down' state (because it
> doesn't need to).
>
>
> >
> > Thanks in advance for helping me understand solrCloud intern state change
> > mechanism.
> >
> > --
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--


Re: New to SolrCloud

2015-04-28 Thread Erick Erickson
Your last comment really answered. A ZK quorum is explicitly ((num zk
instances)/2) + 1.

So no, you don't need 6 nodes at all. It's perfectly reasonable to run
a Solr instance
on each node and a ZK instance (not embedded) on the same three nodes.

I think you're over-thinking the problem though. How often does a
machine fail? If it's more
often than once in an blue moon, you have _other_ problems.

The sole caution is that when running _embedded_ zookeeper with Solr
(as opposed to
stand-alone even if on the same node), you'll be bringing the Solr
instance up and down
repeatedly when developing your app. Trust me on this ;). Having
Zookeeper embedded
just makes this more likely to have ZK fall beneath quorum. Not to
mention that you'll want
to upgrade sometime or... That said, I know of production environments
where all the Zookeepers
are embedded in Solr, no independent ZKs at all.

And in large installations, as Shawn mentions, you probably don't want
the additional load
on the Solr nodes.

Finally, the ZK nodes don't have much in the way of CPU load. So one
popular option is to put
the ZK instances on lightweight nodes that you happen to have laying
around anyway
and reserve the bigger iron for Solr.

Best,
Erick

On Tue, Apr 28, 2015 at 10:20 AM, shacky  wrote:
>> Yeah, it took me a few tries to get it all straight in my head.
>
> Thanks Erick for your fast answer!
>
>> The only "problem" with running ZK on the same node as Solr is that if the
>> node goes down, it takes _both_ zookeeper and Solr with it. If running
>> the "embedded zookeeper", then you can't even bounce the Solr server without
>> taking down the ZK node. Solr will run fine even with embedded ZK,
>> you just have to be very careful when you take the node up or down.
>
> Yes, but what happens when a Zookeeper node goes down if I have three nodes?
> As a Solr node could go down, even a Zookeper one could go down, so
> this _needs_ to be an expected issue in a highly available
> infrastructure, doesn't it?
>
>> Bottom line: It's just easier, from an administrative standpoint, to
>> run Zookeeper
>> as an external process. That way, you can freely bounce your Solr nodes
>> without falling below quorum. Whether or not it shares the same machine as a
>> running instance of Solr is up to you.
>
>> You absolutely _do_ want to
>> 1> have at least one replica for each and every shard on a different box
>> 2> have each Zookeeper running on a separate box.
>
> But doing so I need 6 nodes, am I wrong?
>
>> That way, if any single box dies you have a complete collection available and
>> a quorum of ZK nodes present. How many more machines you have and
>> how you distribute your collections amongst them is up to you.
>
> If I have three ZK nodes I will have the quorum even with two
> available nodes, right?


Re: New to SolrCloud

2015-04-28 Thread shacky
> Yeah, it took me a few tries to get it all straight in my head.

Thanks Erick for your fast answer!

> The only "problem" with running ZK on the same node as Solr is that if the
> node goes down, it takes _both_ zookeeper and Solr with it. If running
> the "embedded zookeeper", then you can't even bounce the Solr server without
> taking down the ZK node. Solr will run fine even with embedded ZK,
> you just have to be very careful when you take the node up or down.

Yes, but what happens when a Zookeeper node goes down if I have three nodes?
As a Solr node could go down, even a Zookeper one could go down, so
this _needs_ to be an expected issue in a highly available
infrastructure, doesn't it?

> Bottom line: It's just easier, from an administrative standpoint, to
> run Zookeeper
> as an external process. That way, you can freely bounce your Solr nodes
> without falling below quorum. Whether or not it shares the same machine as a
> running instance of Solr is up to you.

> You absolutely _do_ want to
> 1> have at least one replica for each and every shard on a different box
> 2> have each Zookeeper running on a separate box.

But doing so I need 6 nodes, am I wrong?

> That way, if any single box dies you have a complete collection available and
> a quorum of ZK nodes present. How many more machines you have and
> how you distribute your collections amongst them is up to you.

If I have three ZK nodes I will have the quorum even with two
available nodes, right?


Re: Choosing order of fields in response with fl=field_1, field_2

2015-04-28 Thread Chris Hostetter

because of th enature of the CSV format, the order of the fields *has* to 
be deterministic and consistent for all documents, so the response writer 
sorts them into the approrpaite columns.

for JSON & XML formats this consistency isn't required, so instead Solr 
writes out hte fields of each document in the order they were found in the 
index because it's the fastest & most efficient for solr to return the 
data -- no extra sorting required.

Many decisions about what features live in solr follow the principle of 
"what can we do more efficiently then the client" ... simple things like 
result set sorting & pagination, faceting & highlighting are much more 
efficient to do server side using the underlying lucene index then if 
solrjust shipped all the data across the wire and left it up to the 
client, but something like "make the order of fields consistent for each 
doc in the response" is just as fast/efficient for the client to do as if 
it was done on the server side.

(which is not to say that such a feature would not be possible to 
implement server side, or that any one would have a phiolosophical 
objection to commiting a patch that added an option like this ... i'm just 
trying to explain why something like this has never been a priority)




: Date: Tue, 28 Apr 2015 17:06:02 +0200
: From: Raphaël Tournoy 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Choosing order of fields in response with fl=field_1, field_2
: 
: Hi Everyone,
: 
: if add fl=field_1, field_2 in a query the order of fields in the response is
: good for my needs :
: 
: {
:   field_1 : value X,
:   field_2 : value Y
: }
: 
: *however* it works only with the CSV reponse format :-(
: 
: How do i get the same functionnality with for instance XML and JSON reponse
: format ?
: 
: with wt=json or wt=xml i get fields in the order they are indexed
: 
: Why does it only work with CSV response format ?
: 
: 
: thank you,
: 
: -- 
: Raphaël
: 
: 

-Hoss
http://www.lucidworks.com/

Re: Async deleteshard commands?

2015-04-28 Thread Anshum Gupta
Hi Ian,

DELETESHARD doesn't support ASYNC calls officially. We could certainly do
with a better response but I believe with most of the Collections API calls
at this time in Solr, you could send random params which would get ignored.
Therefore, in this case, I believe that the async param gets ignored.

The go-to reference point to check what's supported is the official
reference guide:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7

This doesn't mentioned support for async DELETESHARD calls.

On Tue, Apr 28, 2015 at 8:05 AM, Ian Rose  wrote:

> Is it possible to run DELETESHARD commands in async mode?  Google searches
> seem to indicate yes, but not definitively.
>
> My local experience indicates otherwise.  If I start with an async
> SPLITSHARD like so:
>
>
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
>
> Then I get back the expected response format, with 
> 12-foo-1
>
> And I can later query for the result via REQUESTSTATUS.
>
> However if I try an async DELETESHARD like so:
>
>
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
>
> The response includes the command result, indicating that the command was
> not run async:
>
> 
> 
> 
> 0
> 16
> 
> 
> 
>
> And in addition REQUESTSTATUS calls for that requestId fail with "Did not
> find taskid [12-foo-4] in any tasks queue".
>
> Synchronous deletes are causing problems for me in production as they are
> timing out in some cases.
>
> Thanks,
> Ian
>
>
> p.s. I'm on version 5.0.0
>



-- 
Anshum Gupta


Solr Highlighting

2015-04-28 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

When I perform a query, the matching document related field information is
displayed separate from the highlighting information. Is there a way to
merge these two so that highlighting for each document appears within the
document level information itself. That way, it would be easier to find
highlights for a particular document.

Otherwise, is there a better way to join these two to get a consolidated
view or is this to be handled custom-built? I am using SolrJ. Please let me
know whats the best way to handle this.


Thanks & Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Async deleteshard commands?

2015-04-28 Thread Ian Rose
Done!

https://issues.apache.org/jira/browse/SOLR-7481


On Tue, Apr 28, 2015 at 11:09 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> This is a bug. Can you please open a Jira issue?
>
> On Tue, Apr 28, 2015 at 8:35 PM, Ian Rose  wrote:
>
> > Is it possible to run DELETESHARD commands in async mode?  Google
> searches
> > seem to indicate yes, but not definitively.
> >
> > My local experience indicates otherwise.  If I start with an async
> > SPLITSHARD like so:
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
> >
> > Then I get back the expected response format, with 
> > 12-foo-1
> >
> > And I can later query for the result via REQUESTSTATUS.
> >
> > However if I try an async DELETESHARD like so:
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
> >
> > The response includes the command result, indicating that the command was
> > not run async:
> >
> > 
> > 
> > 
> > 0
> > 16
> > 
> > 
> > 
> >
> > And in addition REQUESTSTATUS calls for that requestId fail with "Did not
> > find taskid [12-foo-4] in any tasks queue".
> >
> > Synchronous deletes are causing problems for me in production as they are
> > timing out in some cases.
> >
> > Thanks,
> > Ian
> >
> >
> > p.s. I'm on version 5.0.0
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Support of solr in Spark

2015-04-28 Thread Jeetendra Gangele
Hi All

I have around 20 million company name and I want to index them.
Currently What I am doing I am tokenizing and for each token I am applying 
Metaphone 3 and then Stroring each token in Hbase.
When I get new query(company to match) I will again tokenize and apply 
metaphone3 as I did when I stored them in Hbase
Now for each token I will query Hbase and collate the result.

This seems inefficient and has some issue even after implementing the 
functionality of 
WordDelimiterFilterFactory
 and singleFilter factory.

I am thinking to index these companies name in solr since all the functionality 
already there?

Do we have support for spark?



Re: Solr VS Google Mini Search Appliance

2015-04-28 Thread Erick Erickson
Well, frankly if what they have already serves their needs, I see no reason
they should switch. Investing time and effort in "more modern technology"
without a compelling reason is a waste.

Personally, I'd just leave the argument there. Talk to the stake-holders in
the product and get them to list out the changes/improvements they want to
make. And any that aren't easy/possible with what they have now but _could_
be done with Solr that either

1> stand out as must haves and _require_ changing.
or
2> they can live without.

As the wish-list items accumulate, the pressure will build and they'll move
to Solr. And, if there is no such accumulation, their resources can frankly
be better allocated to things that are priorities IMO.


Best,
Erick



On Tue, Apr 28, 2015 at 8:07 AM, Branko Simic <
branko.si...@green-river-media.com> wrote:

> Hi,
>
>
>
> We have a client that will have a website in Ektron CMS. As you may know
> Ektron has good integration with Solr and that is primary reason for us
> developers to use it.
>
> But the client does not want to switch from their Google Mini Search
> appliance even though Google stopped support for it 3 years ago.
>
> We need to make a good case for using Solr over Google Mini. I am
> collecting information all over internet but would also like to hear what
> you have to say as well.
>
> Thanks for taking time to read and hopefully answer my email.
>
>
>
> Regards,
>
>
>
> *Branko Simić*
>
> Lead Application Engineer
>
>
>
> [image: GRM]
>
>
>
> +44 (0)203 637 0128 | www.green-river-media.com | One St. Peter’s Road,
> Maidenhead, SL6 7QU
>
>
>
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/facebook_1.png]*
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/twitter_1.png]*
> *[image:
> http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/linkedin_1.png]*
> 
>
>
>


RE: Solr VS Google Mini Search Appliance

2015-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Branko,

Your client's existing "enterprise search" isn't supported now, so  I'm not 
saying that you should stick with Google Mini Search appliance.   However, 
there are some things that Solr doesn't do well, and you should be aware of 
these before advocating Solr as a pure developer.Here are some factors that 
might lead them to stay with what they know:

*What else do they have in the Google Mini Search appliance?

*Do they have other applications/systems outside of Ektron CMS that are 
indexed/searched by the Google Mini Search appliance?

*Do they make use of document-level security implemented by the Google 
Mini Search appliance?

*Are they concerned about role-based access control (RBAC) to a Web UI?

*Do they have many different search interfaces built on-top of the 
Google Mini Search appliance?

Here are some arguments in favor of using the Solr/CMS integration:

*The Ektron CMS handles the "front-end" of search, and so branding of 
the search is entirely within the CMS.

*Indexing will be up-to-date more quickly after content is published(or 
promoted), rather than once the periodic scheduled indexing is performed.

*Solr can handle multiple collections of indexed data, and building a 
search application in front of Solr is just as easy as customizing a search 
application provided by Google Mini Search appliance.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

From: Branko Simic [mailto:branko.si...@green-river-media.com]
Sent: Tuesday, April 28, 2015 11:07 AM
To: solr-user@lucene.apache.org
Subject: Solr VS Google Mini Search Appliance

Hi,

We have a client that will have a website in Ektron CMS. As you may know Ektron 
has good integration with Solr and that is primary reason for us developers to 
use it.
But the client does not want to switch from their Google Mini Search appliance 
even though Google stopped support for it 3 years ago.
We need to make a good case for using Solr over Google Mini. I am collecting 
information all over internet but would also like to hear what you have to say 
as well.
Thanks for taking time to read and hopefully answer my email.

Regards,

Branko Simić
Lead Application Engineer

[GRM]

+44 (0)203 637 0128 | 
www.green-river-media.com | One St. Peter's 
Road, Maidenhead, SL6 7QU

[http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/facebook_1.png]

[http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/twitter_1.png]
  
[http://www.green-river-media.com/GreenRiverMedia/media/GRMMedia/Images/linkedin_1.png]
 



Solr VS Google Mini Search Appliance

2015-04-28 Thread Branko Simic
Hi,

 

We have a client that will have a website in Ektron CMS. As you may know
Ektron has good integration with Solr and that is primary reason for us
developers to use it. 

But the client does not want to switch from their Google Mini Search
appliance even though Google stopped support for it 3 years ago. 

We need to make a good case for using Solr over Google Mini. I am collecting
information all over internet but would also like to hear what you have to
say as well.

Thanks for taking time to read and hopefully answer my email.

 

Regards,

 

Branko Simić

Lead Application Engineer

 



 

+44 (0)203 637 0128 |  
www.green-river-media.com | One St. Peter's Road, Maidenhead, SL6 7QU

 

 

 

 



Re: Async deleteshard commands?

2015-04-28 Thread Shalin Shekhar Mangar
This is a bug. Can you please open a Jira issue?

On Tue, Apr 28, 2015 at 8:35 PM, Ian Rose  wrote:

> Is it possible to run DELETESHARD commands in async mode?  Google searches
> seem to indicate yes, but not definitively.
>
> My local experience indicates otherwise.  If I start with an async
> SPLITSHARD like so:
>
>
> http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1
>
> Then I get back the expected response format, with 
> 12-foo-1
>
> And I can later query for the result via REQUESTSTATUS.
>
> However if I try an async DELETESHARD like so:
>
>
> http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4
>
> The response includes the command result, indicating that the command was
> not run async:
>
> 
> 
> 
> 0
> 16
> 
> 
> 
>
> And in addition REQUESTSTATUS calls for that requestId fail with "Did not
> find taskid [12-foo-4] in any tasks queue".
>
> Synchronous deletes are causing problems for me in production as they are
> timing out in some cases.
>
> Thanks,
> Ian
>
>
> p.s. I'm on version 5.0.0
>



-- 
Regards,
Shalin Shekhar Mangar.


Async deleteshard commands?

2015-04-28 Thread Ian Rose
Is it possible to run DELETESHARD commands in async mode?  Google searches
seem to indicate yes, but not definitively.

My local experience indicates otherwise.  If I start with an async
SPLITSHARD like so:

http://localhost:8983/solr/admin/collections?action=splitshard&collection=2Gp&shard=shard1_0_0&async=12-foo-1

Then I get back the expected response format, with 
12-foo-1

And I can later query for the result via REQUESTSTATUS.

However if I try an async DELETESHARD like so:

http://localhost:8983/solr/admin/collections?action=deleteshard&collection=2Gp&shard=shard1_0_0&async=12-foo-4

The response includes the command result, indicating that the command was
not run async:




0
16




And in addition REQUESTSTATUS calls for that requestId fail with "Did not
find taskid [12-foo-4] in any tasks queue".

Synchronous deletes are causing problems for me in production as they are
timing out in some cases.

Thanks,
Ian


p.s. I'm on version 5.0.0


Choosing order of fields in response with fl=field_1, field_2

2015-04-28 Thread Raphaël Tournoy

Hi Everyone,

if add fl=field_1, field_2 in a query the order of fields in the 
response is good for my needs :


{
field_1 : value X,
field_2 : value Y
}

*however* it works only with the CSV reponse format :-(

How do i get the same functionnality with for instance XML and JSON 
reponse format ?


with wt=json or wt=xml i get fields in the order they are indexed

Why does it only work with CSV response format ?


thank you,

--
Raphaël



smime.p7s
Description: Signature cryptographique S/MIME


Re: Solr + RDF = SolRDF

2015-04-28 Thread Andrea Gazzarini

Hi Daniel,
no, unfortunately not...it is definitely one of the interesting 
challenges of this "wedding": inference in responses or crazy things 
like "inferential faceting".
It's all in the "grocery" list :) but I never thought about the concrete 
implementation. Thanks for your suggestions.


Best,
Andrea

On 04/28/2015 04:26 PM, Davis, Daniel (NIH/NLM) [C] wrote:

Both cool and interesting.
Andrea, does your Solr RDF indexing project support inference? If so, is 
inference done by Jena or ahead of time before indexing by Solr?

-Original Message-
From: Andrea Gazzarini [mailto:a.gazzar...@gmail.com]
Sent: Tuesday, April 28, 2015 7:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr + RDF = SolRDF

Hi Charlie,
definitely cool and interesting.

Best,
Andrea

On 04/28/2015 10:20 AM, Charlie Hull wrote:

On 27/04/2015 21:41, Andrea Gazzarini wrote:

Hi guys,
I'd like to share with you a project (actually a hobby for me) where
I'm spending my free time, maybe someone could get some idea or
benefit from it.

https://github.com/agazzarini/SolRDF

I called it SolRDF (Solr + RDF): It is a set of Solr extensions for
managing (indexing and querying) RDF data.

As a first step I wrote a set of classes for injecting Apache Jena
RDF cababilities in Solr.

Hi Andrea,

Interesting...we've been working with Jena and SPARQL queries as part
of our BioSolr project: we presented on this in London last week
(includes links to slides and code):
http://www.flax.co.uk/blog/2015/04/24/lucenesolr-london-meetup-biosolr
-and-query-deep-dive/

Cheers

Charlie



That allows to

 - index RDF data using one of the standard formats (n-triples,
rdf+xml,
 turtle)
 - query those data using SPARQL, the RDF query language

Trying all of that is just a matter of two minutes, as illustrated in
this post [1]. Follow those five steps and Solr will quickly act as a
fully compliant SPARQL 1.1 endpoint.

On top of that, I'm going further with the development of what I
called "Hybrid mode", where Solr capabilities, like faceting, can be
applied to an RDF Dataset queried using SPARQL. You can read more
about this in this post [2] or in the project Wiki [3]. Two kinds of
facets (object and range object faceting) are already working and
available in the master; I'd like to integrate pivot and interval
facets, too.

As I said this is just an amusement for me (and a great chance to
explore how Solr works behind the scenes) so I'm gradually and slowly
going ahead, making available only "stable" features.

Any feedback / idea / comment / question is warmly welcome ;)

Best,

Andrea

[1]
http://andreagazzarini.blogspot.it/2014/12/a-solr-rdf-store-and-sparq
l-endpoint-in.html


[2]
http://andreagazzarini.blogspot.it/2015/04/rdf-faceting-with-apache-s
olr-solrdf.html


[3] https://github.com/agazzarini/SolRDF/wiki/Faceted%20search







Re: New to SolrCloud

2015-04-28 Thread Shawn Heisey
On 4/28/2015 4:40 AM, shacky wrote:
> I'm using Solr for 3 years and now I want to move to a SolrCloud
> configuration on 3 nodes which would make my infrastructure highly
> available.
> But I am very confused about it.
> 
> I read that ZooKeeper should not be installed on the same Solr nodes,
> but I also read another guide that installs one ZooKeeper instance and
> 2 Solr instance, so I cannot understand how it can be completely
> redundant.
> I also read the SolrCloud quick start guide (which installs N nodes on
> the same server), but I am still confused about what I need to do to
> configure the production nodes.


Erick's reply is spot on.

My two cents:

Solr can work perfectly when zookeeper is running on the same nodes.  It
can even work perfectly if you choose to run the embedded zookeeper on
all three nodes and configure them into an ensemble, but the embedded
zookeeper is not recommended at all for a production SolrCloud.  I
personally think we should have never created the embedded zookeeper,
but it is very effective for quickly getting a test installation running.

One of the indexes that I maintain is the minimum possible redundant
SolrCloud install.  It consists of three servers in total.  Two of them
run both Solr and a separate zookeeper process, the third runs only
zookeeper, and is a much lower spec server than the other two.

The only real concern with running zookeeper on the same machine as Solr
is I/O bandwidth.  If you can put the zookeeper database on a completely
separate disk (or set of disks) from your Solr indexes, then that is not
a worry.  If the disk I/O on the server never gets high enough that it
would delay reads and writes on the zookeeper database, then you could
even have the zookeeper database on the same disk volume as Solr data.

My own SolrCloud install does not use separate disks.  It works because
there is plenty of RAM to cache my entire index and the zookeeper
database, so disk I/O isn't an issue.

http://wiki.apache.org/solr/SolrPerformanceProblems#SolrCloud

Thanks,
Shawn



Re: Multiple index.timestamp directories using up disk space

2015-04-28 Thread Mark Miller
If copies of the index are not eventually cleaned up, I'd fill a JIRA to
address the issue. Those directories should be removed over time. At times
there will have to be a couple around at the same time and others may take
a while to clean up.

- Mark

On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> SolrCloud does need up to twice the amount of disk space as your usual
> index size during replication. Amongst other things, this ensures you have
> a full copy of the index at any point. There's no way around this, I would
> suggest you provision the additional disk space needed.
> On 20 Apr 2015 23:21, "Rishi Easwaran"  wrote:
>
> > Hi All,
> >
> > We are seeing this problem with solr 4.6 and solr 4.10.3.
> > For some reason, solr cloud tries to recover and creates a new index
> > directory - (ex:index.20150420181214550), while keeping the older index
> as
> > is. This creates an issues where the disk space fills up and the shard
> > never ends up recovering.
> > Usually this requires a manual intervention of  bouncing the instance and
> > wiping the disk clean to allow for a clean recovery.
> >
> > Any ideas on how to prevent solr from creating multiple copies of index
> > directory.
> >
> > Thanks,
> > Rishi.
> >
>


Re: Overseer role in solrCloud

2015-04-28 Thread Shalin Shekhar Mangar
Comments inline:

On Tue, Apr 28, 2015 at 3:00 PM, Gopal Jee  wrote:

> I am trying to understand the role of overseer and solrCloud stateChange
> mechanism. I tried finding resources on web, but with not much luck.
> Can someone point me to some relevant doc or explain. Few doubts i have:
> 1. In doc, it says overseer updates clusterstate.json when a new node
> joins. How does overseer node knows when a new node joins. Even overseer is
> one independent node.
>

When a new node is loaded, it 'publishes' a 'state' message, for each local
core that it loads, to the overseer queue. This message contains the node's
base_url, state, core_name etc. This is how the overseer knows about a node.


> 2. There is an  overseer queue znode in zookeeper. Do all solr servers
> update its state in overseer queue? what type of events are published to
> the queue? is this queue maintained inside zookeeper?
>

This queue is maintained inside ZooKeeper. See
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Queues

All Solr servers publish a message to this queue when they change state.
See the Overseer.processMessage method for the kind of messages supported
at
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L330



> 3. when a node goes down and loose connection with zookeeper, does
> zookeeper updates its state in clusterstate.json or it lets overseer know
> about lost connection and let it update clusterstate.json?
>

If a node shuts down gracefully then yes, it publishes a 'down' state
message to the overseer queue and the overseer updates the cluster state.
If a node is killed or crashes or looses connection with ZK for whatever
reasons, then the ZooKeeper server waits until the ZK session expiry
timeout to remove the node's corresponding entry from /live_nodes
automatically.


> 4. in docs it says that when a node is in down state, it can not cater to
> read or write request. I have tried issuing a get request on one node which
> is showing down in solr cloud panel and i did get  response ( with all
> relevant documents). How is this happening? When a node goes to down state,
> does it not block its request handlers or notify it not to cater to any get
> requests?
>

When a replica goes into 'down' state then the other SolrCloud nodes as
well as SolrJ clients will not route requests to that replica. Also, if the
'down' replica gets a request then it will forward the request to an
'active' replica automatically.

No, it doesn't actively block requests if in 'down' state (because it
doesn't need to).


>
> Thanks in advance for helping me understand solrCloud intern state change
> mechanism.
>
> --
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Mutli term synonyms

2015-04-28 Thread Kaushik
Hi there,

I tried the solution provided in
https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
.The mentioned solution works when the indexed data does not have alpha
numerics or special characters. But in  my case the synonyms are something
like the below.


 T-MAZ 20  POLYOXYETHYLENE (20) SORBITAN MONOLAURATE  SORBITAN
MONODODECANOATE  POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE  POLYOXYETHYLENE
SORBITAN MONOLAURATE  POLYSORBATE 20 [MART.]  SORBIMACROGOL LAURATE
300  POLYSORBATE
20 [FHFI]  FEMA NO. 2915

They have alpha numerics, special characters, spaces, etc. Is there a way
to implment synonyms even in such case?

Thanks,
Kaushik

On Mon, Apr 20, 2015 at 11:03 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Handling MESH descriptor preferred terms and such is similar.   I
> encountered this during evaluation of Solr for a project here at NLM.   We
> decided to use Solr for different projects instead. I considered the
> following approaches:
>  - use a custom tokenizer at index time that indexed all of the multiple
> term alternatives.
>  - index the data, and then have an enrichment process that queries on
> each source synonym, and generates an update to add the target synonyms.
>Follow this with an optimize.
>  - During the indexing process, but before sending the data to Solr,
> process the data to tokenize and add synonyms to another field.
>
> Both the custom tokenizer and enrichment process share the feature that
> they use Solr's own tokenizer rather than duplicate it.   The enrichment
> process seems to me only workable in environments where you can re-index
> all data periodically, so no continuous stream of data to index that needs
> to be handled relatively quickly once it is generated.The last method
> of pre-processing the data seems the least desirable to me from a blue-sky
> perspective, but is probably the easiest to implement and the most
> independent of Solr.
>
> Hope this helps,
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
> -Original Message-
> From: Kaushik [mailto:kaushika...@gmail.com]
> Sent: Monday, April 20, 2015 10:47 AM
> To: solr-user@lucene.apache.org
> Subject: Mutli term synonyms
>
> Hello,
>
> Reading up on synonyms it looks like there is no real solution for multi
> term synonyms. Is that right? I have a use case where I need to map one
> multi term phrase to another. i.e. Tween 20 needs to be translated to
> Polysorbate 40.
>
> Any thoughts as to how this can be achieved?
>
> Thanks,
> Kaushik
>


Re: Why Solr default to multivalued true

2015-04-28 Thread Shawn Heisey
On 4/28/2015 3:58 AM, balmydrizzle wrote:
> Just happen to have same issue as this question posted on stackoverflow site:
> Why Solr default Multivalued to true? - Stack Overflow
> http://stackoverflow.com/questions/21933032/why-solr-default-multivalued-to-true

It has been my experience that if I don't put 'multivalued="true"' into
the schema, then the field is NOT multivalued.

If the "version" parameter on your schema is 1.0, then all fields are
inherently multi-valued, because the first schema version did not have
the multi-valued concept.  The most recent schema version is 1.5.

https://wiki.apache.org/solr/SchemaXml#Schema_version_attribute_in_the_root_node

If you do not specify the version in your schema.xml file, it defaults
to 1.0, which does mean that all fields will be multivalued.

Thanks,
Shawn



RE: Solr + RDF = SolRDF

2015-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Both cool and interesting.
Andrea, does your Solr RDF indexing project support inference? If so, is 
inference done by Jena or ahead of time before indexing by Solr?

-Original Message-
From: Andrea Gazzarini [mailto:a.gazzar...@gmail.com] 
Sent: Tuesday, April 28, 2015 7:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr + RDF = SolRDF

Hi Charlie,
definitely cool and interesting.

Best,
Andrea

On 04/28/2015 10:20 AM, Charlie Hull wrote:
> On 27/04/2015 21:41, Andrea Gazzarini wrote:
>> Hi guys,
>> I'd like to share with you a project (actually a hobby for me) where 
>> I'm spending my free time, maybe someone could get some idea or 
>> benefit from it.
>>
>> https://github.com/agazzarini/SolRDF
>>
>> I called it SolRDF (Solr + RDF): It is a set of Solr extensions for 
>> managing (indexing and querying) RDF data.
>>
>> As a first step I wrote a set of classes for injecting Apache Jena 
>> RDF cababilities in Solr.
>
> Hi Andrea,
>
> Interesting...we've been working with Jena and SPARQL queries as part 
> of our BioSolr project: we presented on this in London last week 
> (includes links to slides and code):
> http://www.flax.co.uk/blog/2015/04/24/lucenesolr-london-meetup-biosolr
> -and-query-deep-dive/
>
> Cheers
>
> Charlie
>
>
>>
>> That allows to
>>
>> - index RDF data using one of the standard formats (n-triples,
>> rdf+xml,
>> turtle)
>> - query those data using SPARQL, the RDF query language
>>
>> Trying all of that is just a matter of two minutes, as illustrated in 
>> this post [1]. Follow those five steps and Solr will quickly act as a 
>> fully compliant SPARQL 1.1 endpoint.
>>
>> On top of that, I'm going further with the development of what I 
>> called "Hybrid mode", where Solr capabilities, like faceting, can be 
>> applied to an RDF Dataset queried using SPARQL. You can read more 
>> about this in this post [2] or in the project Wiki [3]. Two kinds of 
>> facets (object and range object faceting) are already working and 
>> available in the master; I'd like to integrate pivot and interval 
>> facets, too.
>>
>> As I said this is just an amusement for me (and a great chance to 
>> explore how Solr works behind the scenes) so I'm gradually and slowly 
>> going ahead, making available only "stable" features.
>>
>> Any feedback / idea / comment / question is warmly welcome ;)
>>
>> Best,
>>
>> Andrea
>>
>> [1]
>> http://andreagazzarini.blogspot.it/2014/12/a-solr-rdf-store-and-sparq
>> l-endpoint-in.html
>>
>>
>> [2]
>> http://andreagazzarini.blogspot.it/2015/04/rdf-faceting-with-apache-s
>> olr-solrdf.html
>>
>>
>> [3] https://github.com/agazzarini/SolRDF/wiki/Faceted%20search
>>
>
>



Re: New to SolrCloud

2015-04-28 Thread Erick Erickson
Yeah, it took me a few tries to get it all straight in my head.

Perhaps this will help. Whether or not to install Zookeeper on the same
node as Solr is entirely your decision. And I'm assuming that you're NOT
talking about the embedded Zookeeper BTW.

The only "problem" with running ZK on the same node as Solr is that if the
node goes down, it takes _both_ zookeeper and Solr with it. If running
the "embedded zookeeper", then you can't even bounce the Solr server without
taking down the ZK node. Solr will run fine even with embedded ZK,
you just have to be very careful when you take the node up or down.

Bottom line: It's just easier, from an administrative standpoint, to
run Zookeeper
as an external process. That way, you can freely bounce your Solr nodes
without falling below quorum. Whether or not it shares the same machine as a
running instance of Solr is up to you.

As long as one replica for each Solr node is running _somewhere_, and as
long as a ZK quorum is present, your Solr instance will run fine.
You're completely
correct that having all multiple replicas on the same box (with or
without Zookeeper
running there) is "less robust" than running them all on separate machines. But
it may be "good enough". Especially when you have big hardware, you want to
make use of all that hardware so running multiple Solrs (and maybe Zookeeper)
can make sense.

You absolutely _do_ want to
1> have at least one replica for each and every shard on a different box
2> have each Zookeeper running on a separate box.

That way, if any single box dies you have a complete collection available and
a quorum of ZK nodes present. How many more machines you have and
how you distribute your collections amongst them is up to you.

I will add, though, that machines shouldn't die very often, so it's
easy to over-think
the problem.

Best,
Erick

On Tue, Apr 28, 2015 at 3:40 AM, shacky  wrote:
> Hi.
>
> I'm using Solr for 3 years and now I want to move to a SolrCloud
> configuration on 3 nodes which would make my infrastructure highly
> available.
> But I am very confused about it.
>
> I read that ZooKeeper should not be installed on the same Solr nodes,
> but I also read another guide that installs one ZooKeeper instance and
> 2 Solr instance, so I cannot understand how it can be completely
> redundant.
> I also read the SolrCloud quick start guide (which installs N nodes on
> the same server), but I am still confused about what I need to do to
> configure the production nodes.
>
> I installed all my 3 nodes and runned Solr 5.1.0 on all of them, now I
> have to configure ZooKeeper on all nodes and run Solr in SolrCloud
> configuration.
> I want a completely redundant infrastructure, with both indexing,
> replication and searching available and working with the tolerance of
> one node.
>
> Could you help me to fresh my mind, please?
>
> Thank you very much!
> Bye


Re: Attributes in and

2015-04-28 Thread Steve Rowe
Hi Steve,

From 
:

> The properties that can be specified for a given field type fall into
> three major categories:
>   • Properties specific to the field type's class.
>   • General Properties Solr supports for any field type.
>   • Field Default Properties that can be specified on the field type
> that will be inherited by fields that use this type instead of
> the default behavior.

“indexed” and “stored” are among the Field Default Properties listed as 
specifiable on -s.

 properties override  properties, not the reverse.

Steve

> On Apr 28, 2015, at 9:25 AM, Steven White  wrote:
> 
> Hi Everyone,
> 
> Looking at the out-of-the box schema.xml of Solr 5.1, I see this:
> 
> class="solr.TextField" >
>  
> 
> Is it valid to have "stored" and "indexed" on ?  My
> understanding is that those are on  only.  If not, is the value in
>  overrides what's in ?
> 
> Thanks
> 
> Steve



Attributes in and

2015-04-28 Thread Steven White
Hi Everyone,

Looking at the out-of-the box schema.xml of Solr 5.1, I see this:


  

Is it valid to have "stored" and "indexed" on ?  My
understanding is that those are on  only.  If not, is the value in
 overrides what's in ?

Thanks

Steve


Off-top: Solr with language detection

2015-04-28 Thread LAFK
Shani,
Off topic: that footer of yours may collide with list policy. All content here 
is publicly available, in case you missed it.

@LAFK_PL
  Oryginalna wiadomość  
Od: Chaushu, Shani
Wysłano: wtorek, 28 kwietnia 2015 14:59
Do: solr-user@lucene.apache.org
Odpowiedz: solr-user@lucene.apache.org
Temat: Solr with language detection

Hi,
I'm trying to use the Solr Tika language detection.
I added to the Solrconfig.xml:




content
0.1
language_t
en



And I have in the schema dynamic text field *_t
And I also added the lib.
I ran the nutch on single site, written in English
All succeed in success, the site added to Solr, but no language field added

Am I missing something? There is anything else that I should do in order to see 
the language field?
Thanks.

-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Custom Query Implementation?

2015-04-28 Thread Doug Turnbull
Johannes,

If you just want to implement a custom search syntax, Solr can be great for
this. You just need a "Solr query parser" which takes a search string and
allows you to translate that into any number of Lucene queries. These are
fairly straightforward to implement with a small amount of Lucene knowledge:
http://java.dzone.com/articles/create-custom-solr-queryparser

If the existing Lucene queries don't fit the bill, and you need something
more advanced you can build custom Lucene queries. I have blogged and
spoken about this topic, you might find these links useful:

http://opensourceconnections.com/blog/2014/01/20/build-your-own-custom-lucene-query-and-scorer/
http://opensourceconnections.com/blog/2014/03/12/using-customscorequery-for-custom-solrlucene-scoring/

Corresponding presentation:
http://apacheconnorthamerica2014.sched.org/event/5e312d30375eb43733abc9b7506646e3#.VT-D5WXsl4s
https://www.youtube.com/watch?v=UotgfwNpqrs

Hope that helps,
-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search  from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

On Tue, Apr 28, 2015 at 4:01 AM, Johannes Ruscheinski <
johannes.ruschein...@uni-tuebingen.de> wrote:

> Hi,
>
> I am entirely new to the world of SOLR programming and I have the
> following questions:
>
> In addition to our regular searches we need to implement a specialised
> form of range search and ranking.  What I mean by this is that users can
> search for one or more numeric ranges like "17:85,205:303" etc.  (These
> are range-begin/range-end pairs.)  A small percentage of our records,
> maybe less than 10% will have similar ranges, again, one or more, stored
> in a SOLR field.  We need to apply a custom scoring function and filter
> the matches, too.  (Not all ranges match and scores will typically
> differ greatly.)   Where are all the places where we have to insert
> code?  Also, any tips on how to develop and debug this?  I am using the
> Linux command-line and Emacs.  I am linking against SOLR by using "javac
> -cp solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant
> but, I might mention it anyway: We are using SOLR as a part of VuFind.
>
> I'd be greatful for any suggestions.
>
> Thank you!
>
> --Johannes
>
> --
> Dr. Johannes Ruscheinski
> Universitätsbibliothek Tübingen - IT-Abteilung -
> Wilhelmstr. 32, 72074 Tübingen
>
> Tel: +49 7071 29-72820
> FAX: +49 7071 29-5069
> Email: johannes.ruschein...@uni-tuebingen.de
>
>
>


Solr with language detection

2015-04-28 Thread Chaushu, Shani
Hi,
I'm trying to use the Solr Tika language detection.
I added to the Solrconfig.xml:


 
   
 content
0.1
 language_t
 en
   


And I have in the schema dynamic text field *_t
And I also added the lib.
I ran the nutch on single site, written in English
All succeed in success, the site added to Solr, but no language field added

Am I missing something? There is anything else that I should do in order to see 
the language field?
Thanks.

-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Expected mime type application/octet-stream but got text/html

2015-04-28 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Stephen, that was the issue. In the URL, I missed the solr part.
Thanks for your help. Functionality is working fine now.


Thanks & Regards
Vijay


On 28 April 2015 at 13:05, Stephan Schubert 
wrote:

> Hi,
>
> just a wild guess: you are calling /solr/update instead of
> /solr/collection/update
>
> Regards
>
> Stephan
>
>
>
> Von:Vijaya Narayana Reddy Bhoomi Reddy
> 
> An: solr-user@lucene.apache.org,
> Datum:  28.04.2015 13:57
> Betreff:Expected mime type application/octet-stream but got
> text/html
>
>
>
> Hi,
>
> I am suddenly seeing this error message when I try to index documents
> using
> SolrJ client. The same piece of code was working fine last time when I
> indexed the documents. But now, this is the error message being thrown on
> the SolrJ client. Request your urgent help as this is very high priority
> for me.
>
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> Error 404 Not Found
> 
> 
> HTTP ERROR: 404
> Problem accessing /update. Reason:
> Not Found
> Powered by Jetty://
>
> 
> 
>
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> at
>
> com.lgcgroup.solr.indexer.SolrTikaIndexer.indexBinaryDocuments(SolrTikaIndexer.java:154)
> at com.lgcgroup.solr.indexer.IndexingDriver.main(IndexingDriver.java:54)
>
>
> Thanks & Regards
> Vijay
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.
>
>
>
> SICK AG - Sitz: Waldkirch i. Br. - Handelsregister: Freiburg i. Br. HRB
> 280355
> Vorstand: Dr. Robert Bauer (Vorsitzender)  -  Reinhard Bösl  -  Dr. Mats
> Gökstorp  -  Dr. Martin Krämer  -  Markus Vatter
> Aufsichtsrat: Gisela Sick (Ehrenvorsitzende) - Klaus M. Bukenberger
> (Vorsitzender)
>

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


RE: Expected mime type application/octet-stream but got text/html

2015-04-28 Thread Stephan Schubert
Hi,

just a wild guess: you are calling /solr/update instead of 
/solr/collection/update

Regards

Stephan



Von:Vijaya Narayana Reddy Bhoomi Reddy 

An: solr-user@lucene.apache.org, 
Datum:  28.04.2015 13:57
Betreff:Expected mime type application/octet-stream but got 
text/html



Hi,

I am suddenly seeing this error message when I try to index documents 
using
SolrJ client. The same piece of code was working fine last time when I
indexed the documents. But now, this is the error message being thrown on
the SolrJ client. Request your urgent help as this is very high priority
for me.


Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected mime type application/octet-stream but got text/html. 


Error 404 Not Found


HTTP ERROR: 404
Problem accessing /update. Reason:
Not Found
Powered by Jetty://




at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at
com.lgcgroup.solr.indexer.SolrTikaIndexer.indexBinaryDocuments(SolrTikaIndexer.java:154)
at com.lgcgroup.solr.indexer.IndexingDriver.main(IndexingDriver.java:54)


Thanks & Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

 
 
SICK AG - Sitz: Waldkirch i. Br. - Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  -  Reinhard Bösl  -  Dr. Mats 
Gökstorp  -  Dr. Martin Krämer  -  Markus Vatter 
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende) - Klaus M. Bukenberger 
(Vorsitzender) 


Re: Expected mime type application/octet-stream but got text/html

2015-04-28 Thread Vijaya Narayana Reddy Bhoomi Reddy
Just to add, my solrconfig.xml is the standard one, with no modifications.
It was taken directly from the collection1 core from 4.10.2 installation.
However, in schema.xml, I have added my own fields. Hope it has got nothing
to do with schema.xml

Thanks & Regards
Vijay

  wrote:

> Hi,
>
> I am suddenly seeing this error message when I try to index documents
> using SolrJ client. The same piece of code was working fine last time when
> I indexed the documents. But now, this is the error message being thrown on
> the SolrJ client. Request your urgent help as this is very high priority
> for me.
>
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> Error 404 Not Found
> 
> 
> HTTP ERROR: 404
> Problem accessing /update. Reason:
> Not Found
> Powered by Jetty://
>
> 
> 
>
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> at
> com.lgcgroup.solr.indexer.SolrTikaIndexer.indexBinaryDocuments(SolrTikaIndexer.java:154)
> at com.lgcgroup.solr.indexer.IndexingDriver.main(IndexingDriver.java:54)
>
>
> Thanks & Regards
> Vijay
>

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Expected mime type application/octet-stream but got text/html

2015-04-28 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

I am suddenly seeing this error message when I try to index documents using
SolrJ client. The same piece of code was working fine last time when I
indexed the documents. But now, this is the error message being thrown on
the SolrJ client. Request your urgent help as this is very high priority
for me.


Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected mime type application/octet-stream but got text/html. 


Error 404 Not Found


HTTP ERROR: 404
Problem accessing /update. Reason:
Not Found
Powered by Jetty://




at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at
com.lgcgroup.solr.indexer.SolrTikaIndexer.indexBinaryDocuments(SolrTikaIndexer.java:154)
at com.lgcgroup.solr.indexer.IndexingDriver.main(IndexingDriver.java:54)


Thanks & Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Solr + RDF = SolRDF

2015-04-28 Thread Andrea Gazzarini

Hi Charlie,
definitely cool and interesting.

Best,
Andrea

On 04/28/2015 10:20 AM, Charlie Hull wrote:

On 27/04/2015 21:41, Andrea Gazzarini wrote:

Hi guys,
I'd like to share with you a project (actually a hobby for me) where I'm
spending my free time, maybe someone could get some idea or benefit 
from it.


https://github.com/agazzarini/SolRDF

I called it SolRDF (Solr + RDF): It is a set of Solr extensions for
managing (indexing and querying) RDF data.

As a first step I wrote a set of classes for injecting Apache Jena RDF
cababilities in Solr.


Hi Andrea,

Interesting...we've been working with Jena and SPARQL queries as part 
of our BioSolr project: we presented on this in London last week 
(includes links to slides and code): 
http://www.flax.co.uk/blog/2015/04/24/lucenesolr-london-meetup-biosolr-and-query-deep-dive/


Cheers

Charlie




That allows to

- index RDF data using one of the standard formats (n-triples, 
rdf+xml,

turtle)
- query those data using SPARQL, the RDF query language

Trying all of that is just a matter of two minutes, as illustrated in 
this

post [1]. Follow those five steps and Solr will quickly act as a fully
compliant SPARQL 1.1 endpoint.

On top of that, I'm going further with the development of what I called
"Hybrid mode", where Solr capabilities, like faceting, can be applied 
to an
RDF Dataset queried using SPARQL. You can read more about this in 
this post

[2] or in the project Wiki [3]. Two kinds of facets (object and range
object faceting) are already working and available in the master; I'd 
like

to integrate pivot and interval facets, too.

As I said this is just an amusement for me (and a great chance to 
explore
how Solr works behind the scenes) so I'm gradually and slowly going 
ahead,

making available only "stable" features.

Any feedback / idea / comment / question is warmly welcome ;)

Best,

Andrea

[1]
http://andreagazzarini.blogspot.it/2014/12/a-solr-rdf-store-and-sparql-endpoint-in.html 



[2]
http://andreagazzarini.blogspot.it/2015/04/rdf-faceting-with-apache-solr-solrdf.html 



[3] https://github.com/agazzarini/SolRDF/wiki/Faceted%20search








Re: how to store _text field

2015-04-28 Thread Mirko Torrisi
Hi guys,

I used the Erick's suggestions (thanks again!!) to create a new field and
copy in it the _text content.

curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : { "name":"content", "type":"string", "indexed":true,
"stored":true}, "add-copy-field" : { "source":"_text", "dest": [
"content"]}}' http://localhost:8983/solr/Test/schema

That seems a good way but I discovered the presence of "bias" in every
content field. Indeed, they start with a string of this kind:

 \n \n stream_content_type text/plain  \n stream_size 1556  \n
Content-Encoding UTF-8  \n X-Parsed-By
org.apache.tika.parser.DefaultParser  \n X-Parsed-By
org.apache.tika.parser.txt.TXTParser  \n Content-Type text/plain;
charset=UTF-8  \n resourceName /home/mirko/Desktop/data
sample/sample1/TEXT_CRE_20110608_3-114-500.txt

Now I need to cut off this part but I have no idea also because the path
(present in the last part) has a dynamic length.

For someone could be a problem to have two field with the same content
(double space needed). I have not this problem because I use Solrj to
import, modify and export each document. Maybe I could use it to do also
this but hopefully you know a cleaner method.

Cheers,
Mirko


Mirko

On 19 March 2015 at 20:11, Erick Erickson  wrote:

> Hmm, not all that sure. That's one thing about schemaless indexing, it
> has to guess. It does the best it can, but it's quite possible that it
> guesses wrong.
>
> If this is a "mananged schema", you can use the REST API commands to
> make whatever field you want. Or you can start over with a concrete
> schema.xml and use _that_. Otherwise, I'm not sure what to say without
> actually being on your system.
>
> Wish I could help more.
> Erick
>
> On Thu, Mar 19, 2015 at 5:39 AM, Mirko Torrisi
>  wrote:
> > Hi Erick,
> >
> > I'm sorry for this delay but I've just seen this reply.
> >
> > I'm using the last version of solr and the default setting is to use the
> new
> > kind of indexing, it doesn't use schema.xml and for that I have no idea
> > about how set "store" for this field.
> > The content is grabbed because I've obtained results using the search
> > function but it is not showed because it is not setted to "store".
> >
> > I hope to be clear.
> > Thanks very much.
> >
> > All the best,
> >
> > Mirko
> >
> >
> > On 14/03/15 17:58, Erick Erickson wrote:
> >>
> >> Right, your schema.xml file will define, perhaps, some "dynamic
> >> fields". First insure that stored="true" is specified. If you change
> >> this, you have to re-index the docs.
> >>
> >> Second, insure that your "fl" parameter with the field is specified on
> >> the requests, something like q=*:*&fl=eoe_txt.
> >>
> >> Third, insure that you are actually sending content to that field when
> >> you index docs.
> >>
> >> If none of this helps, show us the definition from schema.xml and a
> >> sample input document and a query that illustrate the problem please.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi
> >>  wrote:
> >>>
> >>> Hi Alexandre,
> >>>
> >>> I need to visualize the content of _txt. For some reasons, actual it is
> >>> not
> >>> showed in the results (the "response").
> >>> I guess that it doesn't happen because it isn't stored (for some
> default
> >>> setting that I'd like to change).
> >>>
> >>> Thanks for your help,
> >>>
> >>> Mirko
> >>>
> >>>
> >>> On 13/03/15 00:27, Alexandre Rafalovitch wrote:
> 
>  Wait, step back. This is confusing. What's your real problem you are
>  trying to solve?
> 
>  Regards,
>   Alex.
>  
>  Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>  http://www.solr-start.com/
> 
> 
>  On 12 March 2015 at 19:50, Mirko Torrisi  >
>  wrote:
> >
> > Hi folks,
> >
> > I googled and tried without success so I ask you: how can I modify
> the
> > setting of a field to store it ?
> >
> > It is interesting to note that I did not add _text field so I guess
> it
> > is
> > a
> > default one. Maybe it is normal that it is not showed on the result
> but
> > actually this is my real problem. It could be grand also to copy it
> in
> > a
> > new
> > field but I do not know how to do it with the last Solr (5) and the
> new
> > kind
> > of schema. I know that I have to use curl but I do not know how to
> use
> > it
> > to
> > copy a field.
> >
> > Thank you in advance!
> > Cheers,
> >
> >Mirko
> >>>
> >>>
> >
>


Re: Why Solr default to multivalued true

2015-04-28 Thread Ahmet Arslan
Hi,

I checked comments in the example schema.xml it says false by default.
How do you figure out that it is true?


Ahmet



On Tuesday, April 28, 2015 1:05 PM, balmydrizzle  wrote:
Just happen to have same issue as this question posted on stackoverflow site:
Why Solr default Multivalued to true? - Stack Overflow
http://stackoverflow.com/questions/21933032/why-solr-default-multivalued-to-true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-Solr-default-to-multivalued-true-tp4202768.html
Sent from the Solr - User mailing list archive at Nabble.com.


New to SolrCloud

2015-04-28 Thread shacky
Hi.

I'm using Solr for 3 years and now I want to move to a SolrCloud
configuration on 3 nodes which would make my infrastructure highly
available.
But I am very confused about it.

I read that ZooKeeper should not be installed on the same Solr nodes,
but I also read another guide that installs one ZooKeeper instance and
2 Solr instance, so I cannot understand how it can be completely
redundant.
I also read the SolrCloud quick start guide (which installs N nodes on
the same server), but I am still confused about what I need to do to
configure the production nodes.

I installed all my 3 nodes and runned Solr 5.1.0 on all of them, now I
have to configure ZooKeeper on all nodes and run Solr in SolrCloud
configuration.
I want a completely redundant infrastructure, with both indexing,
replication and searching available and working with the tolerance of
one node.

Could you help me to fresh my mind, please?

Thank you very much!
Bye


Why Solr default to multivalued true

2015-04-28 Thread balmydrizzle
Just happen to have same issue as this question posted on stackoverflow site:
Why Solr default Multivalued to true? - Stack Overflow
http://stackoverflow.com/questions/21933032/why-solr-default-multivalued-to-true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-Solr-default-to-multivalued-true-tp4202768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Overseer role in solrCloud

2015-04-28 Thread Gopal Jee
I am trying to understand the role of overseer and solrCloud stateChange
mechanism. I tried finding resources on web, but with not much luck.
Can someone point me to some relevant doc or explain. Few doubts i have:
1. In doc, it says overseer updates clusterstate.json when a new node
joins. How does overseer node knows when a new node joins. Even overseer is
one independent node.
2. There is an  overseer queue znode in zookeeper. Do all solr servers
update its state in overseer queue? what type of events are published to
the queue? is this queue maintained inside zookeeper?
3. when a node goes down and loose connection with zookeeper, does
zookeeper updates its state in clusterstate.json or it lets overseer know
about lost connection and let it update clusterstate.json?
4. in docs it says that when a node is in down state, it can not cater to
read or write request. I have tried issuing a get request on one node which
is showing down in solr cloud panel and i did get  response ( with all
relevant documents). How is this happening? When a node goes to down state,
does it not block its request handlers or notify it not to cater to any get
requests?

Thanks in advance for helping me understand solrCloud intern state change
mechanism.

--


Antwort: Start Solr with multiple external zookeepers on Windows Server?

2015-04-28 Thread Stephan Schubert
Hi there,

editing the solr.in.cmd and list the zookeeper hosts there instead of 
passing them via parameter in the console worked. I'm using Solr 5.1 btw.





Von:Stephan Schubert 
An: solr-user@lucene.apache.org, 
Datum:  27.04.2015 19:06
Betreff:Start Solr with multiple external zookeepers on Windows 
Server?



Hi everyone,

how is it possible to start solr with an external set of zookeeper 
instances (quorum of 3 servers) on a windows server (2008R2)?

>From the wiki I got (
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

)

bin\solr restart -c -p 8983 -z 
samplehost1:9983,samplehost2:9983,samplehost3:9983 -m 6g

But this seems not to work on windows. After I start this command, the 
start skript prints out the help option only. With only one zookeeper 
Instance it works. Any suggestions?

Regards

Stephan

Solr Version: 5.1 
 
SICK AG - Sitz: Waldkirch i. Br. - Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  -  Reinhard Bösl  -  Dr. Mats 
Gökstorp  -  Dr. Martin Krämer  -  Markus Vatter 
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende) - Klaus M. Bukenberger 
(Vorsitzender) 

 
 
SICK AG - Sitz: Waldkirch i. Br. - Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  -  Reinhard Bösl  -  Dr. Mats 
Gökstorp  -  Dr. Martin Krämer  -  Markus Vatter 
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende) - Klaus M. Bukenberger 
(Vorsitzender) 


Re: Solr node going to recovering state during heavy reindexing

2015-04-28 Thread Gopal Jee
Thanks Shawn for the insight. WIll try your recommendations .

Gopal

On Mon, Apr 27, 2015 at 9:46 PM, Rajesh Hazari 
wrote:

> thanks, i am sure that we have missed this command line property, this
> gives me more information on how to use latest solr scripts more
> effectively.
>
>
> *Thanks,*
> *Rajesh**.*
>
> On Mon, Apr 27, 2015 at 12:04 PM, Shawn Heisey 
> wrote:
>
> > On 4/27/2015 9:15 AM, Gopal Jee wrote:
> > > We have a 26 node solr cloud cluster. During heavy re-indexing, some of
> > > nodes go into recovering state.
> > > as per current config, soft commit is set to 15 minute and hard commit
> to
> > > 30 sec. Moreover, zkClientTimeout is set to 30 sec in solr nodes.
> > > Please advise.
> >
> > The most common reason for this is general performance issues that make
> > some operations take longer than the zkClientTimeout.
> >
> > My first suspect would be long garbage collection pauses.  This assumes
> > you're not using a very recent version (4.10.x or 5.x) with the new
> > bin/solr script, and your java commandline does not have any garbage
> > collection tuning.  The bin/solr script does a lot of GC tuning.
> >
> > The second suspect would be that you don't have enough RAM left for your
> > operating system to cache your index effectively.
> >
> > It's possible to have both of these problems happening.  These problems,
> > and a few others, are outlined here:
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Thanks,
> > Shawn
> >
> >
>


Custom Query Implementation?

2015-04-28 Thread Johannes Ruscheinski
Hi,

I am entirely new to the world of SOLR programming and I have the
following questions:

In addition to our regular searches we need to implement a specialised
form of range search and ranking.  What I mean by this is that users can
search for one or more numeric ranges like "17:85,205:303" etc.  (These
are range-begin/range-end pairs.)  A small percentage of our records,
maybe less than 10% will have similar ranges, again, one or more, stored
in a SOLR field.  We need to apply a custom scoring function and filter
the matches, too.  (Not all ranges match and scores will typically
differ greatly.)   Where are all the places where we have to insert
code?  Also, any tips on how to develop and debug this?  I am using the
Linux command-line and Emacs.  I am linking against SOLR by using "javac
-cp solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant
but, I might mention it anyway: We are using SOLR as a part of VuFind.

I'd be greatful for any suggestions.

Thank you!

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Re: Solr + RDF = SolRDF

2015-04-28 Thread Charlie Hull

On 27/04/2015 21:41, Andrea Gazzarini wrote:

Hi guys,
I'd like to share with you a project (actually a hobby for me) where I'm
spending my free time, maybe someone could get some idea or benefit from it.

https://github.com/agazzarini/SolRDF

I called it SolRDF (Solr + RDF): It is a set of Solr extensions for
managing (indexing and querying) RDF data.

As a first step I wrote a set of classes for injecting Apache Jena RDF
cababilities in Solr.


Hi Andrea,

Interesting...we've been working with Jena and SPARQL queries as part of 
our BioSolr project: we presented on this in London last week (includes 
links to slides and code): 
http://www.flax.co.uk/blog/2015/04/24/lucenesolr-london-meetup-biosolr-and-query-deep-dive/


Cheers

Charlie




That allows to

- index RDF data using one of the standard formats (n-triples, rdf+xml,
turtle)
- query those data using SPARQL, the RDF query language

Trying all of that is just a matter of two minutes, as illustrated in this
post [1]. Follow those five steps and Solr will quickly act as a fully
compliant SPARQL 1.1 endpoint.

On top of that, I'm going further with the development of what I called
"Hybrid mode", where Solr capabilities, like faceting, can be applied to an
RDF Dataset queried using SPARQL. You can read more about this in this post
[2] or in the project Wiki [3]. Two kinds of facets (object and range
object faceting) are already working and available in the master; I'd like
to integrate pivot and interval facets, too.

As I said this is just an amusement for me (and a great chance to explore
how Solr works behind the scenes) so I'm gradually and slowly going ahead,
making available only "stable" features.

Any feedback / idea / comment / question is warmly welcome ;)

Best,

Andrea

[1]
http://andreagazzarini.blogspot.it/2014/12/a-solr-rdf-store-and-sparql-endpoint-in.html

[2]
http://andreagazzarini.blogspot.it/2015/04/rdf-faceting-with-apache-solr-solrdf.html

[3] https://github.com/agazzarini/SolRDF/wiki/Faceted%20search




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Multiple index.timestamp directories using up disk space

2015-04-28 Thread Ramkumar R. Aiyengar
SolrCloud does need up to twice the amount of disk space as your usual
index size during replication. Amongst other things, this ensures you have
a full copy of the index at any point. There's no way around this, I would
suggest you provision the additional disk space needed.
On 20 Apr 2015 23:21, "Rishi Easwaran"  wrote:

> Hi All,
>
> We are seeing this problem with solr 4.6 and solr 4.10.3.
> For some reason, solr cloud tries to recover and creates a new index
> directory - (ex:index.20150420181214550), while keeping the older index as
> is. This creates an issues where the disk space fills up and the shard
> never ends up recovering.
> Usually this requires a manual intervention of  bouncing the instance and
> wiping the disk clean to allow for a clean recovery.
>
> Any ideas on how to prevent solr from creating multiple copies of index
> directory.
>
> Thanks,
> Rishi.
>