AW: /suggest through SolrJ?

2014-09-25 Thread Clemens Wyss DEV
Thx to you two.

Just in case anybody else is trying to do "this". The following SolrJ code 
corresponds to the http request
GET http://localhost:8983/solr/solrpedia/suggest?q=atmo
of  "Solr in Action" (chapter 10):
...
SolrServer server = new HttpSolrServer("http://localhost:8983/solr/solrpedia";);
SolrQuery query = new SolrQuery( "atmo" );
query.setRequestHandler( "/suggest" );
QueryResponse queryresponse = server.query( query );
...
queryresponse.getSpellCheckResponse().getSuggestions();
...


-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:s...@elyograg.org] 
Gesendet: Donnerstag, 25. September 2014 17:37
An: solr-user@lucene.apache.org
Betreff: Re: /suggest through SolrJ?

On 9/25/2014 8:43 AM, Erick Erickson wrote:
> You can call anything from SolrJ that you can call from a URL.
> SolrJ has lots of convenience stuff to set particular parameters, 
> parse the response, etc... But in the end it's communicating with Solr 
> via a URL.
> 
> Take a look at something like SolrQuery for instance. It has a nice 
> command setFacetPrefix. Here's the entire method:
> 
> public SolrQuery setFacetPrefix( String field, String prefix ) {
> this.set( FacetParams.FACET_PREFIX, prefix );
> return this;
> }
> 
> which is really
> this.set( "facet.prefix", prefix ); All it's really doing is 
> setting a SolrParams key/value pair which is equivalent to 
> &facet.prefix=blahblah on a URL.
> 
> As I remember, there's a "setPath" method that you can use to set the 
> destination for the request to "suggest" (or maybe "/suggest"). It's 
> something like that.

Yes, like Erick says, just use SolrQuery for most accesses to Solr on arbitrary 
URL paths with arbitrary URL parameters.  The "set" method is how you include 
those parameters.

The SolrQuery method Erick was talking about at the end of his email is 
setRequestHandler(String), and you would set that to "/suggest".  Full 
disclosure about what this method actually does: it also sets the "qt"
parameter, but with the modern example Solr config, the qt parameter doesn't do 
anything -- you must actually change the URL path on the request, which this 
method will do if the value starts with a forward slash.

Thanks,
Shawn



Re: point buffer returned as an elipse, how to configure?

2014-09-25 Thread david.w.smi...@gmail.com
Hi Mark,

I asked a follow-up question/observation to your Stackoverflow
instantiation of your question.

I also wrote the following, which doesn’t yet fit into an answer because I
don’t know what problem you are yet experiencing:

Some technical details: geo=true|false is an attribute on the field type;
it isn't a request parameter.  Should you want to change it to geo=false,
you will also have to set the worldBounds and certainly re-index.  Almost
any change to a field type in the schema requires a re-index.  If your
units stay degrees then you can continue to use "lat,lon" format, but if
you use another unit specific to some projection then it's not degrees and
I suggest switching to "x y" to avoid confusion with lat,lon format.  FYI
units=degrees is required but it has no effect.  When geo=true, the 'd' in
geofilt is kilometers, when geo=false, 'd' is in the units of the numbers
you put into the index.  The docs are here:

https://cwiki.apache.org/confluence/display/solr/Spatial+Search


It would be awesome if you want to help further spatial in Lucene/Solr.
This year is looking like a great year for spatial — I’m particularly
excited about a new “FlexPrefixTree” from Varun (GSOC 2014) together with
the latest advances in auto-prefixing to be released in Lucene/Solr 5.0.


~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Sep 25, 2014 at 8:42 AM, Mark G  wrote:

> Solr team, I am indexing geographic points in dec degrees lat lon using the
> location_rpt type in my index. The type is setup like this
>
>   class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="true" distErrPct="0.025" maxDistErr="0.09" units="degrees"
>  />
>
> my field definition is this
>
>  stored="true"  multiValued="false"/>
>
> my problem is the return is a very narrow but tall ellipse likely due
> to the degrees and geo  true... but when I change those params to
> geo=false...the index won't start
> this is the query I am using
>
>  String query =
> "
> http://myserver:8983/solr/mycore/select?&q=*:*&fq={!geofilt}&sfield=pointGeom_rpt&pt=
> "
> + lat + "," + lon + "&d=" + distance +
> "&wt=json&indent=true&geo=true&rows=" + rows;
>
>
>
> I am not using solr cloud, and I am on version 4.8.0
>
> I also opened up this stackoverflow question... it has some more details
> and a picture of the return I get
>
>
> http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse
>
>
> BTW, I'm an OpenNLP committer and I am very geospatially focused, let me
> know if you want help with anything geo, I'll try to carve out some time if
> needed.
>
> thanks
> G$
>


Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

2014-09-25 Thread Diego Fernandez
The difference comes in the fact that when you query the same form it matches 2 
tokens including the less common one.  When you query a different form you only 
match on the more common form.  So really you're getting the "boost" from both 
the tiny difference in TF*IDF and the extra token that you match on.

However, I agree that adding a payload might be a better solution.

- Original Message -
> Hi - but this makes no sense, they are scored as equals, except for tiny
> differences in TF and IDF. What you would need is something like a stemmer
> that preserves the original token and gives a < 1 payload to the stemmed
> token. The same goes for filters like decompounders and accent folders that
> change meaning of words.
>  
>  
> -Original message-
> > From:Diego Fernandez 
> > Sent: Wednesday 17th September 2014 23:37
> > To: solr-user@lucene.apache.org
> > Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score
> > to an original term vs a stemmed term
> > 
> > I'm not 100% on this, but I imagine this is what happens:
> > 
> > (using -> to mean "tokenized to")
> > 
> > Suppose that you index:
> > 
> > "I am running home" -> "am run running home"
> > 
> > If you then query "running home" -> "run running home" and thus give a
> > higher score than if you query "runs home" -> "run runs home"
> > 
> > 
> > - Original Message -
> > > The Solr wiki says   "A repeated question is "how can I have the
> > > original term contribute
> > > more to the score than the stemmed version"? In Solr 4.3, the
> > > KeywordRepeatFilterFactory has been added to assist this
> > > functionality. "
> > > 
> > > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> > > 
> > > (Full section reproduced below.)
> > > I can see how in the example from the wiki reproduced below that both
> > > the stemmed and original term get indexed, but I don't see how the
> > > original term gets more weight than the stemmed term.  Wouldn't this
> > > require a filter that gives terms with the keyword attribute more
> > > weight?
> > > 
> > > What am I missing?
> > > 
> > > Tom
> > > 
> > > 
> > > 
> > > -
> > > "A repeated question is "how can I have the original term contribute
> > > more to the score than the stemmed version"? In Solr 4.3, the
> > > KeywordRepeatFilterFactory has been added to assist this
> > > functionality. This filter emits two tokens for each input token, one
> > > of them is marked with the Keyword attribute. Stemmers that respect
> > > keyword attributes will pass through the token so marked without
> > > change. So the effect of this filter would be to index both the
> > > original word and the stemmed version. The 4 stemmers listed above all
> > > respect the keyword attribute.
> > > 
> > > For terms that are not changed by stemming, this will result in
> > > duplicate, identical tokens in the document. This can be alleviated by
> > > adding the RemoveDuplicatesTokenFilterFactory.
> > > 
> > >  > > positionIncrementGap="100">
> > >  
> > >
> > >
> > >
> > >
> > >  
> > > "
> > > 
> > 
> > --
> > Diego Fernandez - 爱国
> > Software Engineer
> > GSS - Diagnostics
> > 
> > 
> 

-- 
Diego Fernandez - 爱国
Software Engineer
GSS - Diagnostics

IRC: aiguofer on #gss and #customer-platform


RE: Best practice for KStemFilter query or index or both?

2014-09-25 Thread Markus Jelsma
Hi - most filters should be used both sides, especially stemmers, accent 
foldings and obviously lowercasing. Synonyms only on one side, depending on how 
you want to utilize them.

Markus

 
 
-Original message-
> From:eShard 
> Sent: Thursday 25th September 2014 22:23
> To: solr-user@lucene.apache.org
> Subject: Best practice for KStemFilter query or index or both?
> 
> Good afternoon,
> Here's my configuration for a text field.
> I have the same configuration for index and query time.
> Is this valid? 
> What's the best practice for these query or index or both?
> for synonyms; I've read conflicting reports on when to use it but I'm
> currently changing it over to at indexing time only.
> 
> Thanks,
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
>  generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> preserveOriginal="1"
> />
>   
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
> 
>   
>   
>   
> 
>  generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> preserveOriginal="1"
> />
>   
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
> 
>   
>   
> 
>  generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> preserveOriginal="1"
> />
>   
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
> 
>   
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Best practice for KStemFilter query or index or both?

2014-09-25 Thread eShard
Good afternoon,
Here's my configuration for a text field.
I have the same configuration for index and query time.
Is this valid? 
What's the best practice for these query or index or both?
for synonyms; I've read conflicting reports on when to use it but I'm
currently changing it over to at indexing time only.

Thanks,


  







  
  






  
  
  






  
  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why does the q parameter change?

2014-09-25 Thread eShard
No, apparently it's the KStemFilter.
should I turn this off at query time?
I'll put this in another question...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread Alexandre Rafalovitch
Have a look at data import handler and you'll need to use nested entities.
That should get you at least a demo. Then you can decide whether that's
good enough.

Regards,
 Alex
On 25/09/2014 3:51 am, "sangeetha.subraman...@gtnexus.com" <
sangeetha.subraman...@gtnexus.com> wrote:

> Hi Team,
>
> I am a newbie to SOLR. I have got search fields stored in a xml file which
> is stored in MSSQL. I want to index on the content of the xml file in SOLR.
> We need to provide search based on the fields present in the XML file.
>
> The reason why we are storing the input details as XML file is , the users
> will be able to add custom input fields on their own with values. Storing
> these custom fields as columns in MSSQL seems to be not an optimal
> solution. So we thought of putting it in XML file and store that file in
> RDBMS.
> But I am not sure on how we can index the content of the file to make
> search better. I believe this can be done by ExtractingRequestHandler.
>
> Could someone help me on how we can implement this/ direct me to some
> pages which could be of help to me ?
>
> Thanks
> Sangeetha
>
>


Re: Turn off suggester

2014-09-25 Thread Tomás Fernández Löbbe
The SuggestComponent is not in the default components list. There must be a
request handler with this component added explicitly in the solrconfig.xml

Tomás

On Thu, Sep 25, 2014 at 12:22 PM, Alexandre Rafalovitch 
wrote:

> Isn't it one of the Solr components? Can it be just removed from the
> default chain? Random poking in the dark here.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 25 September 2014 10:45, Erick Erickson 
> wrote:
> > Well, tell us more about the suggester configuration, the number
> > of unique terms in the field you're using, what version of Solr, etc.
> >
> > As Hoss says, "details matter".
> >
> > Best,
> > Erick
> >
> > On Thu, Sep 25, 2014 at 4:18 AM, PeriS 
> wrote:
> >
> >> Is there a way to turn off the solr suggester? I have about 30M records
> >> and when tomcat starts up, it takes a long time (~10 minutes) for the
> >> suggester to decompress the data or its doing soothing as it hangs on
> >> SolrSuggester.build(); Any ideas please?
> >>
> >> Thanks
> >> -Peri
> >>
> >>
> >>
> >> *** DISCLAIMER *** This is a PRIVATE message. If you are not the
> intended
> >> recipient, please delete without copying and kindly advise us by e-mail
> of
> >> the mistake in delivery.
> >> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
> >> Global Services to any order or other contract unless pursuant to
> explicit
> >> written agreement or government initiative expressly permitting the use
> of
> >> e-mail for such purpose.
> >>
> >>
> >>
>


Re: Turn off suggester

2014-09-25 Thread Alexandre Rafalovitch
Isn't it one of the Solr components? Can it be just removed from the
default chain? Random poking in the dark here.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 25 September 2014 10:45, Erick Erickson  wrote:
> Well, tell us more about the suggester configuration, the number
> of unique terms in the field you're using, what version of Solr, etc.
>
> As Hoss says, "details matter".
>
> Best,
> Erick
>
> On Thu, Sep 25, 2014 at 4:18 AM, PeriS  wrote:
>
>> Is there a way to turn off the solr suggester? I have about 30M records
>> and when tomcat starts up, it takes a long time (~10 minutes) for the
>> suggester to decompress the data or its doing soothing as it hangs on
>> SolrSuggester.build(); Any ideas please?
>>
>> Thanks
>> -Peri
>>
>>
>>
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>> recipient, please delete without copying and kindly advise us by e-mail of
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>> Global Services to any order or other contract unless pursuant to explicit
>> written agreement or government initiative expressly permitting the use of
>> e-mail for such purpose.
>>
>>
>>


Re: Why does the q parameter change?

2014-09-25 Thread eShard
Ok, I think I'm on to something.
I omitted this parameter which means it is set to false by default on my
text field.
I need to set it to true and see what happens...
autoGeneratePhraseQueries="true"
If I'm reading the wiki right, this parameter if true will preserve phrase
queries...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why does the q parameter change?

2014-09-25 Thread eShard
Good afternoon all,
I just implemented a phrase search and the parsed query gets changed from
rapid prototyping to rapid prototype. 
I used the solr analyzer and prototyping was unchanged so I think I ruled
out a tokenizer.
So can anyone tell me what's going on?
Here's the query:
q=rapid prototyping&defType=edismax&qf=text&pf2=text^40&ps=0

here's the debugger:
as you can see; prototyping gets changed to just prototype. What's causing
this and how do I turn it off?
Thanks,



rapid prototyping

rapid prototypingrapid prototyping
(+((DisjunctionMaxQuery((text:rapid))
DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:"rapid
prototype"^40.0)))/no_coord
+(((text:rapid) (text:prototype))~2)
(text:"rapid prototype"^40.0)
ExtendedDismaxQParser



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr and hadoop

2014-09-25 Thread Joel Bernstein
Hi Tom,

I am not aware of a Solr InputFormat implementation yet. The /export
handier, which outputs entire sorted results sets, was designed to support
these types of bulk export operations efficiently. I think a Solr
InputFormat would be excellent project to begin working on.

Also SOLR-6526 is underway to provide SolrCloud with native streaming
aggregation capabilities.


Joel Bernstein
Search Engineer at Heliosearch

On Thu, Sep 25, 2014 at 12:34 PM, Tom Chen  wrote:

> I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the
> indexing part -- the OutputFormat part.
>
> But what I asked for is more on the making Solr index data available to
> Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
> With a Solr InputFormat, we can make the Solr index data available to
> Hadoop MapReduce. Along the same line, we can also make Solr index data
> available to Hive, Spark and etc like what es-hadoop can do.
>
> Best,
> Tom
>
>
>
> On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
> > Morphline stuff (check out
> > https://github.com/markrmiller/solr-map-reduce-example).
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > <
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > >
> > w: appinions.com 
> >
> > On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen  wrote:
> >
> > > I wonder if Solr has InputFormat and OutputFormat like the
> EsInputFormat
> > > and EsOutputFormat that are provided by Elasticserach for Hadoop
> > > (es-hadoop).
> > >
> > > Is it possible for Solr to provide such integration with Hadoop?
> > >
> > > Best,
> > > Tom
> > >
> >
>


Solr mapred MTree merge stage ~6x slower in 4.10

2014-09-25 Thread Brett Hoerner
As an update to this thread, it seems my MTree wasn't completely hanging,
it was just much slower in 4.10.

If I replace 4.9.0 with 4.10 in my jar the MTree merge stage is 6x (or
more) slower (in my case, 20 min becomes 2 hours). I hope to bisect this in
the future, but the jobs I'm running take a long time. I haven't tried to
see if the issue shows on smaller jobs yet (does 1 minute become 6
minutes?).

Brett




On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner 
wrote:

> I have a very weird problem that I'm going to try to describe here to see
> if anyone has any "ah-ha" moments or clues. I haven't created a small
> reproducible project for this but I guess I will have to try in the future
> if I can't figure it out. (Or I'll need to bisect by running long Hadoop
> jobs...)
>
> So, the facts:
>
> * Have been successfully using Solr mapred to build very large Solr
> clusters for months
> * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge
> phase in 4.10
> * Those same jobs (same input, output, and Hadoop cluster itself) succeed
> if I only change my Solr deps to 4.9
> * The job *does succeed* in 4.10 if I use the same data to create more,
> but smaller shards (e.g. 12x as many shards each 1/12th the size of the job
> that fails)
> * Creating my "normal size" shards (the size I want, that works in 4.9)
> the job hangs with 2 mappers running, 0 reducers in the MTree merge phase
> * There are no errors or warning in the syslog/stderr of the MTree
> mappers, no errors ever echo'd back to the "interactive run" of the job
> (mapper says 100%, reduce says 0%, will stay forever)
> * No CPU being used on the boxes running the merge, no GC happening, JVM
> waiting on a futex, all threads blocked on various queues
> * No disk usage problems, nothing else obviously wrong with any box in the
> cluster
>
> I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred
> contrib, mostly some test stuff. I didn't see any transitive dependency
> changes in Solr/Lucene that look like they would affect me.
>


Re: Solr and hadoop

2014-09-25 Thread Tom Chen
I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the
indexing part -- the OutputFormat part.

But what I asked for is more on the making Solr index data available to
Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
With a Solr InputFormat, we can make the Solr index data available to
Hadoop MapReduce. Along the same line, we can also make Solr index data
available to Hive, Spark and etc like what es-hadoop can do.

Best,
Tom



On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
> Morphline stuff (check out
> https://github.com/markrmiller/solr-map-reduce-example).
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com 
>
> On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen  wrote:
>
> > I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
> > and EsOutputFormat that are provided by Elasticserach for Hadoop
> > (es-hadoop).
> >
> > Is it possible for Solr to provide such integration with Hadoop?
> >
> > Best,
> > Tom
> >
>


Re: /suggest through SolrJ?

2014-09-25 Thread Shawn Heisey
On 9/25/2014 8:43 AM, Erick Erickson wrote:
> You can call anything from SolrJ that you can call from a URL.
> SolrJ has lots of convenience stuff to set particular parameters,
> parse the response, etc... But in the end it's communicating
> with Solr via a URL.
> 
> Take a look at something like SolrQuery for instance. It has a nice
> command setFacetPrefix. Here's the entire method:
> 
> public SolrQuery setFacetPrefix( String field, String prefix )
> {
> this.set( FacetParams.FACET_PREFIX, prefix );
> return this;
> }
> 
> which is really
> this.set( "facet.prefix", prefix );
> All it's really doing is setting a SolrParams key/value
> pair which is equivalent to
> &facet.prefix=blahblah
> on a URL.
> 
> As I remember, there's a "setPath" method that you
> can use to set the destination for the request to
> "suggest" (or maybe "/suggest"). It's something like
> that.

Yes, like Erick says, just use SolrQuery for most accesses to Solr on
arbitrary URL paths with arbitrary URL parameters.  The "set" method is
how you include those parameters.

The SolrQuery method Erick was talking about at the end of his email is
setRequestHandler(String), and you would set that to "/suggest".  Full
disclosure about what this method actually does: it also sets the "qt"
parameter, but with the modern example Solr config, the qt parameter
doesn't do anything -- you must actually change the URL path on the
request, which this method will do if the value starts with a forward slash.

Thanks,
Shawn



Re: Solr stops in between indexing

2014-09-25 Thread Erick Erickson
If it was working fine and suddenly stopped, I have to
ask "what was the last thing that changed"? Frankly
it sounds like your network has started having some
problems.

Best,
Erick

On Thu, Sep 25, 2014 at 6:29 AM, madhav bahuguna 
wrote:

> Hi,
> I have solr configured on google cloud server.
> When ever i try to index it ,it stops in between and shows and error
> connection lost,connection timeout.
> I have 2200 records some time it stops full indexing at 917 sometime 1385
> sometime
> 2185.
> I have apache2 running on google cloud on debian OS.
> Earlier it was working fine,it has started giving this error recently only
> Please advise and help.
>
> --
> Regards
> Madhav Bahuguna
>


Re: Turn off suggester

2014-09-25 Thread Erick Erickson
Well, tell us more about the suggester configuration, the number
of unique terms in the field you're using, what version of Solr, etc.

As Hoss says, "details matter".

Best,
Erick

On Thu, Sep 25, 2014 at 4:18 AM, PeriS  wrote:

> Is there a way to turn off the solr suggester? I have about 30M records
> and when tomcat starts up, it takes a long time (~10 minutes) for the
> suggester to decompress the data or its doing soothing as it hangs on
> SolrSuggester.build(); Any ideas please?
>
> Thanks
> -Peri
>
>
>
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
> recipient, please delete without copying and kindly advise us by e-mail of
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
> Global Services to any order or other contract unless pursuant to explicit
> written agreement or government initiative expressly permitting the use of
> e-mail for such purpose.
>
>
>


Re: /suggest through SolrJ?

2014-09-25 Thread Erick Erickson
You can call anything from SolrJ that you can call from a URL.
SolrJ has lots of convenience stuff to set particular parameters,
parse the response, etc... But in the end it's communicating
with Solr via a URL.

Take a look at something like SolrQuery for instance. It has a nice
command setFacetPrefix. Here's the entire method:

public SolrQuery setFacetPrefix( String field, String prefix )
{
this.set( FacetParams.FACET_PREFIX, prefix );
return this;
}

which is really
this.set( "facet.prefix", prefix );
All it's really doing is setting a SolrParams key/value
pair which is equivalent to
&facet.prefix=blahblah
on a URL.

As I remember, there's a "setPath" method that you
can use to set the destination for the request to
"suggest" (or maybe "/suggest"). It's something like
that.

Best,
Erick


On Thu, Sep 25, 2014 at 3:47 AM, Clemens Wyss DEV 
wrote:

> Am I right that I cannot call /suggest (i.e. the corresponding
> RequestHandler) through SolrJ?
>
> What is the preferreded way to "call" Solr handlers/operations not
> supported by SolrJ from Java? Through new SolrJ Request-classes?
>


Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread Aman Tandon
Hi,

You can retrieve the data in xml format aswell in JSON.

You need to learn about schema.xml, in this you define your fields which is
present in your xml, on which fields you want to search,etc

So it would be better to take a look at schema.xml, solr sample schema
could clear.most of doubts.
On Sep 25, 2014 5:12 PM, "PeriS"  wrote:

> Hi Sangeetha,
>
> If you can tell me a little bit more about your setup, I can try and help.
> If you are on skype, that would be the easiest.
>
> Thanks
> -Peri
>
> On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote:
>
> > Hi Team,
> >
> > I am a newbie to SOLR. I have got search fields stored in a xml file
> which is stored in MSSQL. I want to index on the content of the xml file in
> SOLR. We need to provide search based on the fields present in the XML file.
> >
> > The reason why we are storing the input details as XML file is , the
> users will be able to add custom input fields on their own with values.
> Storing these custom fields as columns in MSSQL seems to be not an optimal
> solution. So we thought of putting it in XML file and store that file in
> RDBMS.
> > But I am not sure on how we can index the content of the file to make
> search better. I believe this can be done by ExtractingRequestHandler.
> >
> > Could someone help me on how we can implement this/ direct me to some
> pages which could be of help to me ?
> >
> > Thanks
> > Sangeetha
> >
> >
> > ---
> > This message has been scanned for viruses and dangerous content by HTC
> E-Mail Virus Protection Service.
> >
>
>
>
>
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
> recipient, please delete without copying and kindly advise us by e-mail of
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
> Global Services to any order or other contract unless pursuant to explicit
> written agreement or government initiative expressly permitting the use of
> e-mail for such purpose.
>
>
>


Re: SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan
1. I've hosted it with Helios v 0.07 that ships with Solr 4.10
2. Change to solrconfig.xml - 
   a. commits every 10 mins
   b. soft commits every 10 secs
   c. disabled all caches as the usage is very random (no end users only
services doing the searches) and mostly single requests
   d. use cold searcher = true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098p4161132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Slow to boot up

2014-09-25 Thread Michael Della Bitta
1. What version of Solr are you running?
2. Have you made substantial changes to solrconfig.xml?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 

On Thu, Sep 25, 2014 at 7:19 AM, anand.mahajan  wrote:

> Hello all,
>
> Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each -> 108 cores
> across 6 servers. Moved in about 250M documents in this cluster. When I
> restart this cluster - only the leaders per shard comes up live instantly
> (within a minute) and all the replicas are shown as Recovering on the Cloud
> screen and all 6 servers are doing some processing (consuming about 4 CPUs
> at the back and doing a lot of Network IO too) In essence its not doing any
> reads are writes to the index and I dont see any replication/catch up
> activity going on too at the back, yet the RAM grows consuming all 96GB
> available on each box. And all the Recovering replicas recover one by one
> in
> about an hour or so. Why is it taking so long to boot up, and what is it
> doing that is consuming so much CPU, RAM and Network IO? All disks are
> reading at 100% on all servers during this boot up. Is there are setting I
> might have missed that will help?
>
> FYI - The Zookeeper cluster is on the same 6 boxes.  Size of the Solr data
> dir is about 150GB per server and each box has 96GB RAM.
>
> Thanks,
> Anand
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Cloud Default Document Routing

2014-09-25 Thread Erick Erickson
Well, you've picked the absolute worst case for comparison. The
"increase to double digits" is a constant overhead. IOW, let's
say your query went from 5ms to 20 ms. That 15 ms is pretty much
the additional overhead no matter what the query. This particular
query just happens to be very fast in the first place.

As far as queries going out to all the shards.. Well, they have to.
The query processing cannot know ahead of time (except in this
_very_ special case) what shards will generate hits. So the request
is sent out to one replica in each shard, which responds with its
top N. The originating node then combines the sub-queries to get
the IDs of the final top N, then sends a request out to each shard
hosting one of those top N for the data associated with the
document.

If you really need super-efficiency here, you could probably
look at SolrCloudServer to get an idea of how to translate from
ID to shard and just do direct requests with distrib=false.

Best,
Erick


On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla 
wrote:

> Hi,
>
> I'm building out a multi shard solr collection as the index size is likely
> to grow fast.
> I was testing out the setup with 2 shards on 2 nodes with test data.
> Indexed few documents with "id" as the unique key.
> collection create command -
> /solr/admin/collections?action=CREATE&name=multishard&numShards=2
>
> used this command to upload - curl
> http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
> @data.json -H 'Content-type:application/json'
>
> data.json -
> [
>   {
> "id": "100161200"
>   }
>   {
> "id": "100161384"
>   }
> ]
>
> when I query on one of the node with with an id constraint, I see the query
> executed on both shards which looks inefficient - Qtime increased to double
> digits. I guess solr would know based on id which shard data went to.
>
> I have a few questions around this as I could not find pertinent
> information on user lists or documentation.
> - query is hitting all shards and replicas - if I have 3 shards and 5
> replicas , how would the performance be impacted since for the very simple
> case it increased to double digits?
> - Could id lookup queries just go to one shard automatically?
>
>
> /solr/multishard/select?q=id%3A100161200&wt=json&indent=true&debugQuery=true
>
> "QTime":13,
>
>   "debug":{
> "track":{
>   "rid":"-multishard_shard1_replica1-1411605234897-171",
>   "EXECUTE_QUERY":[
> "http://server1/solr/multishard_shard1_replica1/";,[
>   "QTime","1",
>   "ElapsedTime","4",
>   "RequestPurpose","GET_TOP_IDS",
>   "NumFound","1",
>   "Response","some resp"],
> "http://server2/solr/multishard_shard2_replica1/";,[
>   "QTime","1",
>   "ElapsedTime","6",
>   "RequestPurpose","GET_TOP_IDS",
>   "NumFound","0",
>   "Response","some"]],
>   "GET_FIELDS":[
> "http://server1/solr/multishard_shard1_replica1/";,[
>   "QTime","0",
>   "ElapsedTime","4",
>   "RequestPurpose","GET_FIELDS,GET_DEBUG",
>   "NumFound","1",
>
>
> Thanks,
> Susmit
>


Re: Solr and hadoop

2014-09-25 Thread Michael Della Bitta
Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
Morphline stuff (check out
https://github.com/markrmiller/solr-map-reduce-example).

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 

On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen  wrote:

> I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
> and EsOutputFormat that are provided by Elasticserach for Hadoop
> (es-hadoop).
>
> Is it possible for Solr to provide such integration with Hadoop?
>
> Best,
> Tom
>


Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jorge Luis Betancourt Gonzalez
I haven’t used it before this, basically I found out about this in the Solr in 
Action book and guided by the comment about redefining the default components 
by defining a new searchComponent with the same name. 

Any how thanks for your reply! 

Regards,

On Sep 25, 2014, at 8:01 AM, Jack Krupansky  wrote:

> I am not aware of any such feature! That doesn't mean it doesn't exist, but I 
> don't recall seeing it in the Solr source code.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Jorge Luis Betancourt Gonzalez
> Sent: Wednesday, September 24, 2014 1:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Changed behavior in solr 4 ??
> 
> Hi Jack:
> 
> Thanks for the response, yes the way you describe I know it works and is how 
> I get it to work but then what does mean the snippet of the documentation I 
> see on the documentation about overriding the default components shipped with 
> Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
> something similar to what I wanted to do:
> 
> 
> 
>   25
>   content_field
> 
> 
>   *:*
>   true
>   explicit
> 
> 
> Because each default search component exists by default even if it’s not 
> defined explicitly in the solrconfig.xml file, defining them explicitly as in 
> the previous listing will replace the default configuration.
> 
> The previous snippet is from the quoted book Solr in Action, I understand 
> that in each SearchHandler I could define this parameters bu if defined in 
> the searchComponent (as the book says) this configuration wouldn’t apply to 
> all my request handlers? eliminating the need to replicate the same parameter 
> in several parts of my solrconfig.xml (i.e all the request handlers)?
> 
> 
> Regards,
> On Sep 23, 2014, at 11:53 PM, Jack Krupansky  wrote:
> 
> 
>> You set the defaults on the "search handler", not the "search component". 
>> See solrconfig.xml:
>> 
>> 
>> 
>> 
>>   explicit
>>   10
>>   text
>> 
>> ...
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Jorge Luis Betancourt Gonzalez
>> Sent: Tuesday, September 23, 2014 11:02 AM
>> To: solr-user@lucene.apache.org
>> Subject: Changed behavior in solr 4 ??
>> 
>> Hi:
>> 
>> I’m trying to change the default configuration for the query component of a 
>> SearchHandler, basically I want to set a default value to the rows 
>> parameters and that this value be shared by all my SearchHandlers, as stated 
>> on the solrconfig.xml comments, this could be accomplished redeclaring the 
>> query search component, however this is not working on solr 4.9.0 which is 
>> the version I’m using, this is my configuration:
>> 
>>  
>>  
>>  1
>>  
>>  
>> 
>> The relevant portion of the solrconfig.xml comment is: "If you register a 
>> searchComponent to one of the standard names,  will be used instead of the 
>> default.” so is this a new desired behavior?? although just for testing a 
>> redefined the components of the request handler to only use the query 
>> component and not to use all the default components, this is how it looks:
>> 
>> 
>>  query
>> 
>> 
>> 
>> Everything works ok but the the rows parameter is not used, although I’m not 
>> specifying the rows parameter on the URL.
>> 
>> Regards,Concurso "Mi selfie por los 5". Detalles en 
>> http://justiciaparaloscinco.wordpress.com
> 
> 
> Concurso "Mi selfie por los 5". Detalles en 
> http://justiciaparaloscinco.wordpress.com
> 

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Solr and hadoop

2014-09-25 Thread Tom Chen
I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
and EsOutputFormat that are provided by Elasticserach for Hadoop
(es-hadoop).

Is it possible for Solr to provide such integration with Hadoop?

Best,
Tom


Re: MRIT's morphline mapper doesn't co-locate with data

2014-09-25 Thread Tom Chen
Do you have the solr Jira number for the new ingestion tool?

Thanks

On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek 
wrote:

> Based on our measurements, Lucene indexing is so CPU intensive that it
> wouldn’t really help much to exploit data locality on read. The
> overwhelming bottleneck remains the same. Having said that, we have an
> ingestion tool in the works that will take advantage of data locality for
> splitable files as well.
>
> Wolfgang.
>
> On Sep 24, 2014, at 9:38 AM, Tom Chen  wrote:
>
> > Hi,
> >
> > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> > mapper. The mapper doesn't co-locate with the input data that it process.
> > Isn't this a performance hit?
> >
> > Ideally, morphline mapper should be run on those hosts that contain most
> > data blocks for the input files it process.
> >
> > Regards,
> > Tom
>
>


Setting of Default Boost in Edismax Search Handler

2014-09-25 Thread O. Olson
I have a setup very similar to the "/browse" handler in the example
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/solrconfig.xml?view=markup)
  

I am curious if it is possible to set a default boost function (e.g.
&bf=log(qty)) , so that all query results would reflect it.

Thank you,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-of-Default-Boost-in-Edismax-Search-Handler-tp4161122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help in selecting the appropriate feature to obtain results

2014-09-25 Thread Mikhail Khludnev
I call it 'reverse search' problem (regex indexing). It's almost
impossible. You can
- do it your own
http://blog.mikemccandless.com/2013/06/build-your-own-finite-state-transducer.html
- create
http://lucene.apache.org/core/4_1_0/memory/org/apache/lucene/index/memory/MemoryIndex.html
from the incoming string, and search by those stored queries with regexps.
eg check https://www.youtube.com/watch?v=rmRCsrJp2A8
- more realistically you can index separate letters from patterns, search
for any of incoming letters, and postfilter the result, which are found.


On Wed, Sep 24, 2014 at 7:04 PM, barrybear  wrote:

> Hi guys, I'm still a beginner to Solr and I'm not sure whether to
> implement a
> custom Filter Query or any other available features/plugins that I am not
> aware of in Solr. I am using Solr v4.4.0.
>
> I have a collection as an example as below:
>
> [
>{
>   description: 'group1',
>   group: ['G?', 'GE*']
>},
>{
>   description: 'group2',
>   group: ['GEB']
>},
>{
>   description: 'group3',
>   group: ['G']
>}
> ]
>
> Where group field is a multiValued whereby will contain of alphabets which
> will determine the ranking and  two special characters: ? and *. Placing a
> ?
> at the back will mean any subordinate of that ranking, while * means all
> levels of subordinates of that particular ranking.
>
> If I were to search for group:'GEB', I will expect to obtain result:
> [
>{
>   description: 'group1',
>   group: ['G?', 'GE*']
>},
>{
>   description: 'group2',
>   group: ['GEB']
>}
> ]
>
> While searching for group:'GE', should return this result:
> [
>{
>   description: 'group1',
>   group: ['G?', 'GE*']
>}
> ]
>
> And finally searching for group:'G' should only return one result:
> [
>{
>   description: 'group3',
>   group: ['G']
>}
> ]
>
> Hope that my explanation is clear enough and thanks for your attention and
> time..
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Help-in-selecting-the-appropriate-feature-to-obtain-results-tp4160944.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Solr stops in between indexing

2014-09-25 Thread madhav bahuguna
Hi,
I have solr configured on google cloud server.
When ever i try to index it ,it stops in between and shows and error
connection lost,connection timeout.
I have 2200 records some time it stops full indexing at 917 sometime 1385
sometime
2185.
I have apache2 running on google cloud on debian OS.
Earlier it was working fine,it has started giving this error recently only
Please advise and help.

-- 
Regards
Madhav Bahuguna


point buffer returned as an elipse, how to configure?

2014-09-25 Thread Mark G
Solr team, I am indexing geographic points in dec degrees lat lon using the
location_rpt type in my index. The type is setup like this

 

my field definition is this



my problem is the return is a very narrow but tall ellipse likely due
to the degrees and geo  true... but when I change those params to
geo=false...the index won't start
this is the query I am using

 String query =
"http://myserver:8983/solr/mycore/select?&q=*:*&fq={!geofilt}&sfield=pointGeom_rpt&pt=";
+ lat + "," + lon + "&d=" + distance +
"&wt=json&indent=true&geo=true&rows=" + rows;



I am not using solr cloud, and I am on version 4.8.0

I also opened up this stackoverflow question... it has some more details
and a picture of the return I get

http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse


BTW, I'm an OpenNLP committer and I am very geospatially focused, let me
know if you want help with anything geo, I'll try to carve out some time if
needed.

thanks
G$


Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jack Krupansky
I am not aware of any such feature! That doesn't mean it doesn't exist, but 
I don't recall seeing it in the Solr source code.


-- Jack Krupansky

-Original Message- 
From: Jorge Luis Betancourt Gonzalez

Sent: Wednesday, September 24, 2014 1:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Changed behavior in solr 4 ??

Hi Jack:

Thanks for the response, yes the way you describe I know it works and is how 
I get it to work but then what does mean the snippet of the documentation I 
see on the documentation about overriding the default components shipped 
with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
something similar to what I wanted to do:



 
   25
   content_field
 
 
   *:*
   true
   explicit
 

Because each default search component exists by default even if it’s not 
defined explicitly in the solrconfig.xml file, defining them explicitly as 
in the previous listing will replace the default configuration.


The previous snippet is from the quoted book Solr in Action, I understand 
that in each SearchHandler I could define this parameters bu if defined in 
the searchComponent (as the book says) this configuration wouldn’t apply to 
all my request handlers? eliminating the need to replicate the same 
parameter in several parts of my solrconfig.xml (i.e all the request 
handlers)?



Regards,
On Sep 23, 2014, at 11:53 PM, Jack Krupansky  
wrote:



You set the defaults on the "search handler", not the "search component". 
See solrconfig.xml:




 
   explicit
   10
   text
 
...

-- Jack Krupansky

-Original Message- From: Jorge Luis Betancourt Gonzalez
Sent: Tuesday, September 23, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Changed behavior in solr 4 ??

Hi:

I’m trying to change the default configuration for the query component of 
a SearchHandler, basically I want to set a default value to the rows 
parameters and that this value be shared by all my SearchHandlers, as 
stated on the solrconfig.xml comments, this could be accomplished 
redeclaring the query search component, however this is not working on 
solr 4.9.0 which is the version I’m using, this is my configuration:


  
  
  1
  
  

The relevant portion of the solrconfig.xml comment is: "If you register a 
searchComponent to one of the standard names,  will be used instead of the 
default.” so is this a new desired behavior?? although just for testing a 
redefined the components of the request handler to only use the query 
component and not to use all the default components, this is how it looks:



  query



Everything works ok but the the rows parameter is not used, although I’m 
not specifying the rows parameter on the URL.


Regards,Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com



Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com




Re: Help needed in Indexing and Search on xml content

2014-09-25 Thread PeriS
Hi Sangeetha,

If you can tell me a little bit more about your setup, I can try and help. If 
you are on skype, that would be the easiest. 

Thanks
-Peri

On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote:

> Hi Team,
> 
> I am a newbie to SOLR. I have got search fields stored in a xml file which is 
> stored in MSSQL. I want to index on the content of the xml file in SOLR. We 
> need to provide search based on the fields present in the XML file.
> 
> The reason why we are storing the input details as XML file is , the users 
> will be able to add custom input fields on their own with values. Storing 
> these custom fields as columns in MSSQL seems to be not an optimal solution. 
> So we thought of putting it in XML file and store that file in RDBMS.
> But I am not sure on how we can index the content of the file to make search 
> better. I believe this can be done by ExtractingRequestHandler.
> 
> Could someone help me on how we can implement this/ direct me to some pages 
> which could be of help to me ?
> 
> Thanks
> Sangeetha
> 
> 
> --- 
> This message has been scanned for viruses and dangerous content by HTC E-Mail 
> Virus Protection Service. 
> 




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Re: Scoring with wild cars

2014-09-25 Thread Jack Krupansky
The wildcard query is “constant score” to make it faster, so unfortunately that 
means there is no score differentiation between the wildcard matches.

You can simple add the wildcard prefix as a separate query term and boost it:

q=text:carre* text:carre^1.5

-- Jack Krupansky

From: Pigeyre Romain 
Sent: Wednesday, September 24, 2014 2:12 PM
To: solr-user@lucene.apache.org 
Cc: Pigeyre Romain 
Subject: Scoring with wild cars

Hi,

 

I hava two records with name_fra field

One with name_fra=”un test CARREAU”

And another one with name_fra=”un test CARRE”

 

{

"codeBarre": "1",

"name_FRA": "un test CARREAU"

  }

{

"codeBarre": "2",

"name_FRA": "un test CARRE"

  }

 

Configuration of these fields are :

 









 



  











  

  











  



 

When I’m using this query :

http://localhost:8983/solr/cdv_product/select?q=text%3Acarre*&fl=score%2C+*&wt=json&indent=true&debugQuery=true

The result is :

{

  "responseHeader":{

"status":0,

"QTime":2,

"params":{

  "debugQuery":"true",

  "fl":"score, *",

  "indent":"true",

  "q":"text:carre*",

  "wt":"json"}},

  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[

  {

   "codeBarre":"1",

"name_FRA":"un test CARREAU",

"_version_":1480150860842401792,

"score":1.0},

  {

"codeBarre":"2",

"name_FRA":"un test CARRE",

"_version_":1480150875738472448,

"score":1.0}]

  },

  "debug":{

"rawquerystring":"text:carre*",

"querystring":"text:carre*",

"parsedquery":"text:carre*",

"parsedquery_toString":"text:carre*",

"explain":{

  "1":"\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n",

  "2":"\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n"},

"QParser":"LuceneQParser",

"timing":{

  "time":2.0,

  "prepare":{

"time":1.0,

"query":{

  "time":1.0},

"facet":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":1.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"debug":{

  "time":1.0}

 

The score is the same for both of record. CARREAU record is first and CARRE is 
next. I want to place CARRE before CARREAU result because CARRE is an exact 
match. Is it possible?

 

NB : scoring for this query only use querynorm and boosters

 

In this test :

http://localhost:8983/solr/cdv_product/select?q=text%3Acarre&fl=score%2C*&wt=json&indent=true&debugQuery=true

 

I have only one record found but the scoring is more complex. Why?

{  "responseHeader":{"status":0,"QTime":2,"params":{  
"debugQuery":"true",  "fl":"score,*",  "indent":"true",  
"q":"text:carre",  "wt":"json"}},  
"response":{"numFound":1,"start":0,"maxScore":0.53033006,"docs":[  {
"codeBarre":"2","name_FRA":"un test CARRE",
"_version_":1480150875738472448,"score":0.53033006}]  },  "debug":{
"rawquerystring":"text:carre","querystring":"text:carre",
"parsedquery":"text:carre","parsedquery_toString":"text:carre",
"explain":{  "2":"\n0.53033006 = (MATCH) weight(text:carre in 0) 
[DefaultSimilarity], result of:\n  0.53033006 = fieldWeight in 0, product of:\n 
   1.4142135 = tf(freq=2.0), with freq of:\n  2.0 = termFreq=2.0\n1.0 = 
idf(docFreq=1, maxDocs=2)\n0.375 = fieldNorm(doc=0)\n"},
"QParser":"LuceneQParser","timing":{  "time":2.0,  "prepare":{  
  "time":1.0,"query":{  "time":1.0},"facet":{  
"time":0.0},"mlt":{  "time":0.0},"highlight":{  
"time":0.0},"stats":{  "time":0.0},"expand":{  
"time":0.0},"debug":{  "time":0.0}},  "process":{
"time":1.0,"query":{  "time":0.0},"facet":{  
"time":0.0},"mlt":{  "time":0.0},"highlight":{  
"time":0.0},"stats":{  "time":0.0},"expand":{  
"time":0.0},"debug":{  "time":1.0} 

 

 

 


  Romain PIGEYRE

  Centre de service de Lyon
 

 
  Sopra
  Parc du Puy d'Or
  72 Allée des Noisetiers - CS 10137
  69578 - LIMONEST
  France
  Phone : +33 (0)4 37 26 43 33
  romain.pige...@so

SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan
Hello all,

Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each -> 108 cores
across 6 servers. Moved in about 250M documents in this cluster. When I
restart this cluster - only the leaders per shard comes up live instantly
(within a minute) and all the replicas are shown as Recovering on the Cloud
screen and all 6 servers are doing some processing (consuming about 4 CPUs
at the back and doing a lot of Network IO too) In essence its not doing any
reads are writes to the index and I dont see any replication/catch up
activity going on too at the back, yet the RAM grows consuming all 96GB
available on each box. And all the Recovering replicas recover one by one in
about an hour or so. Why is it taking so long to boot up, and what is it
doing that is consuming so much CPU, RAM and Network IO? All disks are
reading at 100% on all servers during this boot up. Is there are setting I
might have missed that will help?  

FYI - The Zookeeper cluster is on the same 6 boxes.  Size of the Solr data
dir is about 150GB per server and each box has 96GB RAM.

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Turn off suggester

2014-09-25 Thread PeriS
Is there a way to turn off the solr suggester? I have about 30M records and 
when tomcat starts up, it takes a long time (~10 minutes) for the suggester to 
decompress the data or its doing soothing as it hangs on SolrSuggester.build(); 
Any ideas please?

Thanks
-Peri



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




/suggest through SolrJ?

2014-09-25 Thread Clemens Wyss DEV
Am I right that I cannot call /suggest (i.e. the corresponding RequestHandler) 
through SolrJ?

What is the preferreded way to "call" Solr handlers/operations not supported by 
SolrJ from Java? Through new SolrJ Request-classes?


Re: traversing Automaton in lucene 4.10

2014-09-25 Thread Dmitry Kan
case solved, example of traversal found in lucene's source code (pointed to
by Mike McCandless):

https://github.com/apache/lucene-solr/blob/2836bd99101026860b12233a87e35101769a538f/lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java#L535



On Fri, Sep 19, 2014 at 5:27 PM, Dmitry Kan  wrote:

> Hi,
>
> o.a.l.u.automaton.Automaton api has changed in lucene 4.10 (
> https://issues.apache.org/jira/secure/attachment/12651171/LUCENE-5752.patch
> ).
>
> Method getNumberedStates() got dropped
> class State does not exist anymore.
>
> How do I traverse an Automaton with the new api?
>
> Dmitry
>
> --
> Dmitry Kan
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>



-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: SlrCloud RAM requirments

2014-09-25 Thread Toke Eskildsen
On Thu, 2014-09-25 at 06:29 +0200, Norgorn wrote:
> I can't say for sure, cause filter caches are out of the JVM (dat HS), but
> top shows  5 GB cached and no free RAM.

The cached reported from top should be correct, no matter if one used
off-heap or not: You have 5GB for (I guess) 300MB index, so 1.5% of the
index size.

I agree fully with Shawn that this will never perform for interactive
use, when you're using spinning drives.

> The only question for me now is how to balance disk cache and filter cache?
> Do I need to worry about that, or big disk cache is enough?

Even if you skipped the filters fully (so just simple queries) and
magically had 15GB out of the 16GB free for disk cache, it would only be
5% of the index size. Still not enough for decent performance with
spinning drives, unless your index is very special, e.g. enormous amount
of stored fields.


As for the whole "how much will it help with SSDs?", might I suggest
simply testing? Buy a 500GB SSD and put it in one of the machines, test
searches against that shard vs. the shards on the other machines. If you
do not see much difference, move the drive to your developer machine and
be happy for the upgrade. Win-win.

> And does "optimized index"  mean SOLR "optimize" command, or something else?

Optimized down to a single segment (which I think the 'optimize' command
will do). But you should only consider that if you know that your shard
will not be updated in the foreseeable future.

- Toke Eskildsen, State and University Library, Denmark




(auto)suggestions, but ony from a "filtered" set of documents

2014-09-25 Thread Clemens Wyss DEV
What I'd like to do is
http://localhost:8983/solr/solrpedia/suggest?q=atm&qf=source:

Through qf (or however the parameter shall be called) I'd like to restrict the 
suggestions to documents which fit the given qf-query. 
I need this filter if (as posted in a previous thread) I intend to put 
"different kind of data" into one core/collection, cause suggestion shall be 
restrictable to one or many source(s)


Help needed in Indexing and Search on xml content

2014-09-25 Thread sangeetha.subraman...@gtnexus.com
Hi Team,

I am a newbie to SOLR. I have got search fields stored in a xml file which is 
stored in MSSQL. I want to index on the content of the xml file in SOLR. We 
need to provide search based on the fields present in the XML file.

The reason why we are storing the input details as XML file is , the users will 
be able to add custom input fields on their own with values. Storing these 
custom fields as columns in MSSQL seems to be not an optimal solution. So we 
thought of putting it in XML file and store that file in RDBMS.
But I am not sure on how we can index the content of the file to make search 
better. I believe this can be done by ExtractingRequestHandler.

Could someone help me on how we can implement this/ direct me to some pages 
which could be of help to me ?

Thanks
Sangeetha



Using SolrCloud on Amazon EC2

2014-09-25 Thread Timo Schmidt
Hi together,

we currently plan to setup a project based on solr cloud and amazon 
webservices. Our main search application is deployed using aws opsworks which 
works out quite good.
Since we also want to provision solr to ec2 i want to ask for experiences with 
the different deployment/provisioning tools.
By now i see the following 3 approaches.

1. Using ludic solr scale tk to setup and maintain the cluster
Who is using this in production and what are your experiences?

2. Implementing own chef cookbooks for aws opsworks to install solrcloud as a 
custom opsworks layer
Did somebody do this allready?
What are you experiences?

Are there any cookbooks out, where we can contribute and reuse?

3. Implementing own chef cookbooks for aws opsworks to install solrcloud as a 
docker container
Any experiences with this?

Do you see other options? Afaik elasticbeanstalk could also be an option?
It would be very nice to get some experiences and recommendations?

Cheers

Timo


Re: SlrCloud RAM requirments

2014-09-25 Thread Shawn Heisey
On 9/24/2014 2:18 AM, Toke Eskildsen wrote:
> Norgorn [lsunnyd...@mail.ru] wrote:
>> I have CLOUD with 3 nodes and 16 MB RAM on each.
>> My index is about 1 TB and search speed is awfully bad.
> 
> We all have different standard with regards to search performance. What is 
> "awfully bad" and what is "good enough" for you?
> 
> Related to this: How many documents are in your index, how do you query 
> (faceting, sorting, special searches) and how often is an index performed?
> 
>> I've read, that one needs at least 50% of index size in RAM,
> 
> That is the common advice, yes. The advice is not bad for some use cases. The 
> problem is that it has become gospel.
> 
> I am guessing that you are using spinning drives? Solr needs fast random 
> access reads and spinning drives are very slow for that. You can either 
> compensate by buying enough RAM or you can change to a faster underlying 
> storage technology. The obvious choice these days are Solid State Drives (we 
> bought Samsung 840 EVO's last time and would probably buy those again). They 
> will not give you RAM speed, but they do give a lot more bang for the buck 
> and depending on your performance requirements they can be enough.

I am guilty of spreading the "gospel" that you need 50-100% of your
index to fit in the OS disk cache, as Toke mentioned.  This wiki page is
my creation:

http://wiki.apache.org/solr/SolrPerformanceProblems

I've seen decent performance out of systems with standard hard disks
that only had enough RAM to fit about 25% of the index into the disk
cache, but I've also seen systems with 50% that can't complete a simple
query in less than 10 seconds.

With a terabyte of index on the system (assuming that's how much is on
each one), 25% is still at least 256GB of RAM.  With only 16GB, there's
simply no way you'll ever get good performance.

I've heard quite a lot of anecdotal evidence that if you put the index
on SSD, you only need 10% of the index to fit in RAM.  I'm a little bit
skeptical that this would be true as a general rule, but I do not doubt
that it's been done successfully.  For a terabyte index, that's still
100GB of RAM, so 128GB would be the absolute minimum that you'll want to
consider.  The more RAM you can throw at this problem, the better your
performance will be.

Thanks,
Shawn