Re: ranking retrieval measure

2014-03-31 Thread Floyd Wu
Usually IR system is measured using Precision & Recall.
But depends on what kind of system you are developing to fit what scenario.

Take a look
http://en.wikipedia.org/wiki/Precision_and_recall



2014-04-01 10:23 GMT+08:00 azhar2007 :

> Hi people. Ive developed a search engine to implement and improve it using
> another search engine as a test case. Now I want to compare and test
> results
> from both to determine which is better. I am unaware of how to do this so
> someone please point me in the right direction.
>
> Regards
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ranking-retrieval-measure-tp4128324.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Dmitry Kan
Hi,

We have noticed something like this as well, but with older versions of
solr, 3.4. In our setup we delete documents pretty often. Internally in
Lucene, when a document is client requested to be deleted, it is not
physically deleted, but only marked as "deleted". Our original optimization
assumption was such that the "deleted" documents would get physically
removed on each optimize command issued. We started to suspect it wasn't
always true as the shards (especially relatively large shards) became
slower over time. So we found out about the expungeDeletes option, which
purges the "deleted" docs and is by default false. We have set it to true.
If your solr update lifecycle includes frequent deletes, try this out.

This of course does not override working towards finding better
GCparameters.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching


On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit  wrote:

> Hello,
>
> We are currently using solr 4.2.1. Our index is updated on a daily basis.
> After noticing solr query time has increased (two times the initial size)
> without any change in index size or in solr configuration, we tried an
> optimize on the index but it didn't fix our problem. We checked the garbage
> collector, but everything seemed fine. What did in fact fix our problem was
> to delete all documents and reindex from scratch.
>
> It looks like over time our index gets "corrupted" and optimize doesn't fix
> it. Does anyone have a clue how to investigate further this situation?
>
>
> Elisabeth
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Solr indexing javabean

2014-03-31 Thread Prasi S
Hi,
My solr document has a field is an xml. I am indexing the xml as such to
solr and at runtime, i get the xml, parse it and display. Instead of xml, can
we index that XML as a Java Bean.


Thanks,
Prasi


Re: Product index schema for solr

2014-03-31 Thread Ajay Patel

as per your suggestion my final schema will be like
{
id:
...
...
[PRODUCT RELATED DATAS]
...
...
...
min_qty: 1
max_qty: 50
price: 4
}


[OTHER SAME LIKE ABOVE DATA]



now i want to create range facet field by combing min_qty and max_qty.

i hope you have understood what i want to say :).
thanks a lot in adavance :)


Thanks & Regards
Ajay Patel.


On Mon, Mar 31, 2014 at 8:42 AM, Ajay Patel  
wrote:

On Monday 31 March 2014 06:07 PM, Erick Erickson wrote:

What do you mean by "generalized range facet"? How would
they differ from standard range faceting? Details are important...

Best.
Erick

On Mon, Mar 31, 2014 at 7:44 AM, Ajay Patel  wrote:

Hi Erick
Thank for the reply :). your solution help me to denormalize my data. now i
have one another question can i create a generalize range facet according to
min_qty and max_qty?

Thanks & Regards
Ajay Patel.


On Saturday 29 March 2014 08:54 PM, Erick Erickson wrote:

The usual approach is to de-normalize the tables, so you'd store docs like
(all your product data) min_qty, max_qty, price_per_qty

So the above example would have 4 documents, then it all "just works"

You have to insure that the id () is different for each, and
probably store the product ID in a field other than "id" for this reason.

Best,
Erick

On Fri, Mar 28, 2014 at 10:27 AM, Ajay Patel 
wrote:

Hi Solr user & developers.

i am new in the world of solr search engine. i have a complex product
database structure in postgres.

Product has many product_quantity_price attrbutes in range

For e.g Product iD 1 price range is stored in product_quantity_price
table in following manner.

min_qty max_qty price_per_qty
1504
51  100  3.5
1011503
151200  2.5

the range is not fixed for any product it can be different for different
product.

now my question is that how can i save this data in solr in optimized
way so that i can create facets on qty and prices.

Thanks in advance.
Ajay Patel.














Re: eDismax parser and the mm parameter

2014-03-31 Thread S.L
Jack ,

Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
the mm parameter from my queries, however for the fuzzy phrase queries , I
am not sure how I would be able to leverage the Complex Query Parser there
is absolutely nothing out there that gives me any idea as to how to do that
.

Why is fuzzy phrase search not provided by Solr OOB ? I am surprised

Thanks.


On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky wrote:

> The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> (the default) and ignore the mm parameter. Give pf the highest boost, and
> boost pf3 higher than pf2.
>
> You could try using the complex phrase query parser for the third case.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Monday, March 31, 2014 12:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: eDismax parser and the mm parameter
>
> Thanks Jack , my use cases are as follows.
>
>
>   1. Search for "Ginseng" everything related to ginseng should show up.
>   2. Search For "White Siberian Ginseng" results with the whole phrase
>   show up first followed by 2 words from the phrase followed by a single
> word
>   in the phrase
>   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
>   documents with White Siberian Ginseng Should show up , this looks like
> the
>   most complicated of all as Solr does not support fuzzy phrase searches .
> (I
>   have no solution for this yet).
>
> Thanks again!
>
>
> On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky 
> wrote:
>
>  The mm parameter is really only relevant when the default operator is OR
>> or explicit OR operators are used.
>>
>> Again: Please provide your use case examples and your expectations for
>> each use case. It really doesn't make a lot of sense to prematurely focus
>> on a solution when you haven't clearly defined your use cases.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: S.L
>> Sent: Sunday, March 30, 2014 9:13 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: eDismax parser and the mm parameter
>>
>> Jack,
>>
>> I mis-stated the problem , I am not using the OR operator as default
>> now(now that I think about it it does not make sense to use the default
>> operator OR along with the mm parameter) , the reason I want to use pf and
>> mm in conjunction is because of my understanding of the edismax parser and
>> I have not looked into pf2 and pf3 parameters yet.
>>
>> I will state my understanding here below.
>>
>> Pf -  Is used to boost the result score if the complete phrase matches.
>> mm <(less than) search term length would help limit the query results  to
>> a
>> certain number of better matches.
>>
>> With that being said would it make sense to have dynamic mm (set to the
>> length of search term - 1)?
>>
>> I also have a question around using a fuzzy search along with eDismax
>> parser , but I will ask that in a seperate post once I go thru that aspect
>> of eDismax parser.
>>
>> Thanks again !
>>
>>
>>
>>
>>
>> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky 
>> wrote:
>>
>>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
>>
>>> will be dwarfed.
>>>
>>> The general goal is to assure that the top documents really are the best,
>>> not to necessarily limit the total document count. Focusing on the latter
>>> could be a real waste of time.
>>>
>>> It's still not clear why or how you need or want to use OR as the default
>>> operator - you still haven't given us a use case for that.
>>>
>>> To repeat: Give us a full set of use cases before taking this XY Problem
>>> approach of pursuing a solution before the problem is understood.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: S.L
>>> Sent: Sunday, March 30, 2014 6:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: eDismax parser and the mm parameter
>>>
>>> Jacks Thanks Again,
>>>
>>> I am searching  Chinese medicine  documents , as the example I gave
>>> earlier
>>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
>>> Ginseng
>>> , I certainly want to use pf parameter (which is not driven by mm
>>> parameter) , however for giving higher score to documents that have more
>>> of
>>> the terms I want to use edismax now if I give a mm of 3 and the search
>>> term
>>> is of only length 1 (like "Ginseng") what does edisMax do ?
>>>
>>>
>>> On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky >> >
>>> wrote:
>>>
>>>  It still depends on your objective - which you haven't told us yet. Show
>>>
>>>  us some use cases and detail what your expectations are for each use
 case.

 The edismax phrase boosting is probably a lot more useful than messing
 around with mm. Take a look at pf, pf2, and pf3.

 See:
 http://wiki.apache.org/solr/ExtendedDisMax
 https://cwiki.apache.org/confluence/display/solr/The+
 Extended+DisMax+Query+Parser

 The focus on mm may indeed be a classic "XY Problem" - a premature focus
 on a sol

RE: How to delete documents

2014-03-31 Thread Suresh Soundararajan
Kaushik,

Before delete the rows in the table, collect the primary id of the table 
related to the solr index and fire a solr query by deleteby ID and pass the 
collected ids. This will remove the documents in the solr index.

Thanks,
SureshKumar.S


From: Kaushik 
Sent: Tuesday, April 1, 2014 12:07 AM
To: solr-user@lucene.apache.org
Subject: How to delete documents

>From a database table, we have figured out a way to do the full load and
the delta loads. However, there are scenarios where some of the DB rows get
deleted. How can we have such documents deleted from SOLR indices?

Thanks,
Kaushik
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


ranking retrieval measure

2014-03-31 Thread azhar2007
Hi people. Ive developed a search engine to implement and improve it using
another search engine as a test case. Now I want to compare and test results
from both to determine which is better. I am unaware of how to do this so
someone please point me in the right direction.

Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ranking-retrieval-measure-tp4128324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to index 20 MB plain-text xml

2014-03-31 Thread Alexandre Rafalovitch
If you have an application, why are you sending XML documents to Solr?
Can't you convert it to any other format and then send them in
batches? Or even if it is XML, just bite and send in 100 document
batches. Or in smaller batches and use auto-commit settings I
mentioned earlier.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 1, 2014 at 7:30 AM, Floyd Wu  wrote:
> Hi Upayavira,
> User don't hit solr directly, the search documents through my application.
> The application is a entrance for user to upload documents and then indexed
> by solr.
> the situation is they upload a plain-text, something like dictionary. You
> know, that dictionary is something big.
> I'm trying to figure out some good technique before I can split these xml
> to small one and streaming to solr.
>
> Floyd
>
>
>
> 2014-04-01 2:55 GMT+08:00 Upayavira :
>
>> Tell the user they can't have!
>>
>> Or, write a small app that reads in their XML in one go, and pushes it
>> in parts to Solr. Generally, I'd say letting a user hit Solr directly is
>> a bad thing - especially a user who doesn't know the details of how Solr
>> works.
>>
>> Upayavira
>>
>> On Mon, Mar 31, 2014, at 07:17 AM, Floyd Wu wrote:
>> > Hi Alex,
>> >
>> > Thanks for your responding. Personally I don't want to feed these big xml
>> > to solr. But users wants.
>> > I'll try your suggestions later.
>> >
>> > Many thanks.
>> >
>> > Floyd
>> >
>> >
>> >
>> > 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch :
>> >
>> > > Without digging too deep into why exactly this is happening, here are
>> > > the general options:
>> > >
>> > > 0. Are you actually committing? Check the messages in the logs and see
>> > > if the records show up when you expect them too.
>> > > 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
>> > > buffer that's blowing up? Try using stream.file instead (notice
>> > > security warning though): http://wiki.apache.org/solr/ContentStream
>> > > 2. Split file into smaller ones and and commit each separately
>> > > 3. Set hard auto-commit in solrconfig.xml based on number of documents
>> > > to flush in-memory structures to disk
>> > > 4. Switch to using DataImportHandler to pull from XML instead of
>> pushing
>> > > 5. Increase amount of memory to Solr (-X command line flags)
>> > >
>> > > Regards,
>> > >Alex.
>> > >
>> > > Personal website: http://www.outerthoughts.com/
>> > > Current project: http://www.solr-start.com/ - Accelerating your Solr
>> > > proficiency
>> > >
>> > > On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu  wrote:
>> > > > I have many plain text xml that I transfer to form of solr xml
>> format.
>> > > > But every time I send them to solr, I hit OOM exception.
>> > > > How to configure solr to "eat" these big xml?
>> > > > Please guide me a way. Thanks
>> > > >
>> > > > floyd
>> > >
>>


Re: how to index 20 MB plain-text xml

2014-03-31 Thread Floyd Wu
Hi Upayavira,
User don't hit solr directly, the search documents through my application.
The application is a entrance for user to upload documents and then indexed
by solr.
the situation is they upload a plain-text, something like dictionary. You
know, that dictionary is something big.
I'm trying to figure out some good technique before I can split these xml
to small one and streaming to solr.

Floyd



2014-04-01 2:55 GMT+08:00 Upayavira :

> Tell the user they can't have!
>
> Or, write a small app that reads in their XML in one go, and pushes it
> in parts to Solr. Generally, I'd say letting a user hit Solr directly is
> a bad thing - especially a user who doesn't know the details of how Solr
> works.
>
> Upayavira
>
> On Mon, Mar 31, 2014, at 07:17 AM, Floyd Wu wrote:
> > Hi Alex,
> >
> > Thanks for your responding. Personally I don't want to feed these big xml
> > to solr. But users wants.
> > I'll try your suggestions later.
> >
> > Many thanks.
> >
> > Floyd
> >
> >
> >
> > 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch :
> >
> > > Without digging too deep into why exactly this is happening, here are
> > > the general options:
> > >
> > > 0. Are you actually committing? Check the messages in the logs and see
> > > if the records show up when you expect them too.
> > > 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
> > > buffer that's blowing up? Try using stream.file instead (notice
> > > security warning though): http://wiki.apache.org/solr/ContentStream
> > > 2. Split file into smaller ones and and commit each separately
> > > 3. Set hard auto-commit in solrconfig.xml based on number of documents
> > > to flush in-memory structures to disk
> > > 4. Switch to using DataImportHandler to pull from XML instead of
> pushing
> > > 5. Increase amount of memory to Solr (-X command line flags)
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > Current project: http://www.solr-start.com/ - Accelerating your Solr
> > > proficiency
> > >
> > > On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu  wrote:
> > > > I have many plain text xml that I transfer to form of solr xml
> format.
> > > > But every time I send them to solr, I hit OOM exception.
> > > > How to configure solr to "eat" these big xml?
> > > > Please guide me a way. Thanks
> > > >
> > > > floyd
> > >
>


Re: What is Overseer?

2014-03-31 Thread Jack Krupansky
So, is Overseer really only an "implementation detail" or something that 
Solr Ops guys need to be very aware of?


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, March 31, 2014 3:17 PM
To: solr-user@lucene.apache.org
Subject: Re: What is Overseer?

Hi Chris;

You should check here:
http://grokbase.com/t/lucene/solr-user/12bd9kst9t/role-purpose-of-overseer

Thanks;
Furkan KAMACI


2014-03-31 20:43 GMT+03:00 Chris W :


What is the role of an overseer in solrcloud? The documentation does not
offer full details about it. What if an overseer node goes down?

--
Best
--
C





Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Ken Krugler
Hi Erik (& Shawn),

On Mar 31, 2014, at 1:48pm, Shawn Heisey  wrote:

> On 3/31/2014 2:36 PM, Erik Hatcher wrote:
>> Not currently possible.  Solr’s SchemaCodecFactory only has a hook for 
>> postings format (and doc values format).

OK, thanks for confirming.

> Would it be a reasonable thing to develop a config structure (probably in 
> schema.xml) that starts with something like  and has ways 
> to specify the class and related configuration for each of the components in 
> the codec? Then you could specify codec="foo" on an individual field 
> definition.  The codec definition could allow one of them to have 
> default="true".
> 
> I will admit that my understanding of these Lucene-level details is low, so I 
> could be thinking about this wrong.

The absolute easiest approach would be to support a new init value for 
codecFactory, which SchemaCodecFactory would use to select a different base 
codec class to use (versus always using LuceneCodec). That would 
switch everything to a different codec.

Or you could extend the SchemaCodecFactory to support additional per-field 
settings for stored fields format, etc beyond what's currently available.

For my quick & dirty hack I've specified a different codecFactory in 
solrconfig.xml, and have my own factory that hard-codes the SimpleTextCodec.

This works - all files are in the SimpleTextXXX format, other than the 
segments.gen and segments_XX files; what, those aren't pluggable?!?! :)

-- Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







spellcheck in solr-4.6-1 distrib=true

2014-03-31 Thread alxsss
Hello,

For queries in solrcloud and in distributed mode solr-4.6.1 spellcheck does not 
return any suggestions, but in non-distrubited mode.
Is this a know bug?

Thanks.
Alex.


Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Shawn Heisey

On 3/31/2014 2:36 PM, Erik Hatcher wrote:

Not currently possible.  Solr’s SchemaCodecFactory only has a hook for postings 
format (and doc values format).

Erik


Would it be a reasonable thing to develop a config structure (probably 
in schema.xml) that starts with something like  and 
has ways to specify the class and related configuration for each of the 
components in the codec? Then you could specify codec="foo" on an 
individual field definition.  The codec definition could allow one of 
them to have default="true".


I will admit that my understanding of these Lucene-level details is low, 
so I could be thinking about this wrong.


Thanks,
Shawn



Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Erik Hatcher

On Mar 31, 2014, at 4:02 PM, Ken Krugler  wrote:

> Hi all (and particularly Uwe and Robert),
> 
> On Mar 28, 2014, at 7:24am, Michael McCandless  
> wrote:
> 
>> You told the fieldType to use SimpleText only for the postings, not
>> all other parts of the codec (doc values, live docs, stored fields,
>> etc...), and so it used the default codec for those components.
>> 
>> If instead you used the SimpleTextCodec (not sure how to specify this
>> in Solr's schema.xml) then all components would be SimpleText.
> 
> Yes, that's the gist of my question - how do you specify use of SimpleTextXXX 
> (e.g. SimpleTextStoredFieldsFormat) in Solr?
> 
> Or is this currently not possible?

Not currently possible.  Solr’s SchemaCodecFactory only has a hook for postings 
format (and doc values format).

Erik

Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Ken Krugler
Hi all (and particularly Uwe and Robert),

On Mar 28, 2014, at 7:24am, Michael McCandless  
wrote:

> You told the fieldType to use SimpleText only for the postings, not
> all other parts of the codec (doc values, live docs, stored fields,
> etc...), and so it used the default codec for those components.
> 
> If instead you used the SimpleTextCodec (not sure how to specify this
> in Solr's schema.xml) then all components would be SimpleText.

Yes, that's the gist of my question - how do you specify use of SimpleTextXXX 
(e.g. SimpleTextStoredFieldsFormat) in Solr?

Or is this currently not possible?

Thanks,

-- Ken



> On Fri, Mar 28, 2014 at 8:53 AM, Ken Krugler
>  wrote:
>> Hi all,
>> 
>> I've been using the SimpleTextCodec in the past, but I just noticed 
>> something odd...
>> 
>> I'm running Solr 4.3, and enable the SimpleText posting format via something 
>> like:
>> 
>>> />
>> 
>> The resulting index does have the expected _0_SimpleText_0.pst text output, 
>> but I just noticed that the other files are all the standard binary format 
>> (e.g. .fdt for field data)
>> 
>> Based on SimpleTextCodec.java, I was assuming that I'd get the 
>> SimpleTextStoredFieldsFormat for stored data.
>> 
>> This same holds true for most (all?) of the other files, e.g. 
>> https://issues.apache.org/jira/browse/LUCENE-3074 is about adding a simple 
>> text format for DocValues.
>> 
>> I can walk the code to figure out what's up, but I'm hoping I just need to 
>> change some configuration setting.
>> 
>> Thanks!
>> 
>> -- Ken


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Re: What is Overseer?

2014-03-31 Thread Furkan KAMACI
Hi Chris;

You should check here:
http://grokbase.com/t/lucene/solr-user/12bd9kst9t/role-purpose-of-overseer

Thanks;
Furkan KAMACI


2014-03-31 20:43 GMT+03:00 Chris W :

> What is the role of an overseer in solrcloud? The documentation does not
> offer full details about it. What if an overseer node goes down?
>
> --
> Best
> --
> C
>


Re: Filter caching

2014-03-31 Thread Yonik Seeley
On Mon, Mar 31, 2014 at 2:43 PM, youknow...@heroicefforts.net
 wrote:
> Re-reading the documentation, it seems that Solr caches the results of the fq 
> parameter, not lower level field constraints. This would imply that breaking 
> a single complex boolean filter into multiple conjunctive fq parameters would 
> improve the odds for cache hits.  Is this correct?
>
> fq=(a:foo or b:bar) and c:bah
> Vs.
> fq=(a:foo or b:bar)&fq=c:bah

Yes, you would normally want to do this if both filters had a good
chance of being used again in combination with different filters.

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


Re: how to index 20 MB plain-text xml

2014-03-31 Thread Upayavira
Tell the user they can't have!

Or, write a small app that reads in their XML in one go, and pushes it
in parts to Solr. Generally, I'd say letting a user hit Solr directly is
a bad thing - especially a user who doesn't know the details of how Solr
works.

Upayavira

On Mon, Mar 31, 2014, at 07:17 AM, Floyd Wu wrote:
> Hi Alex,
> 
> Thanks for your responding. Personally I don't want to feed these big xml
> to solr. But users wants.
> I'll try your suggestions later.
> 
> Many thanks.
> 
> Floyd
> 
> 
> 
> 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch :
> 
> > Without digging too deep into why exactly this is happening, here are
> > the general options:
> >
> > 0. Are you actually committing? Check the messages in the logs and see
> > if the records show up when you expect them too.
> > 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
> > buffer that's blowing up? Try using stream.file instead (notice
> > security warning though): http://wiki.apache.org/solr/ContentStream
> > 2. Split file into smaller ones and and commit each separately
> > 3. Set hard auto-commit in solrconfig.xml based on number of documents
> > to flush in-memory structures to disk
> > 4. Switch to using DataImportHandler to pull from XML instead of pushing
> > 5. Increase amount of memory to Solr (-X command line flags)
> >
> > Regards,
> >Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > Current project: http://www.solr-start.com/ - Accelerating your Solr
> > proficiency
> >
> > On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu  wrote:
> > > I have many plain text xml that I transfer to form of solr xml format.
> > > But every time I send them to solr, I hit OOM exception.
> > > How to configure solr to "eat" these big xml?
> > > Please guide me a way. Thanks
> > >
> > > floyd
> >


Filter caching

2014-03-31 Thread youknow...@heroicefforts.net
Re-reading the documentation, it seems that Solr caches the results of the fq 
parameter, not lower level field constraints. This would imply that breaking a 
single complex boolean filter into multiple conjunctive fq parameters would 
improve the odds for cache hits.  Is this correct?

fq=(a:foo or b:bar) and c:bah
Vs.
fq=(a:foo or b:bar)&fq=c:bah


Thanks,

-Jess
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

How to delete documents

2014-03-31 Thread Kaushik
>From a database table, we have figured out a way to do the full load and
the delta loads. However, there are scenarios where some of the DB rows get
deleted. How can we have such documents deleted from SOLR indices?

Thanks,
Kaushik


Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-31 Thread Saurabh Agarwal
Thanks a lot Alexandre for the response much appreciated.

Thanks
Saurabh

On Fri, Mar 28, 2014 at 8:56 AM, Alexandre Rafalovitch
 wrote:
> 1. You don't actually put PDF/Word into Solr. Instead, it is run
> through content and metadata extraction process and then index that.
> This is important because "a computer" does not understand what you
> are looking for when you open a PDF. It only understand whatever text
> is possible to extract. In case of PDF it is often not much at all,
> unless it was generated with accessibility layer in place. You can
> experiment with what you can extract by downloading a standalone
> Apache Tika install, which has a command line version or using Solr's
> extractOnly flag. Solr, internally, uses Tika, so the results should
> be the same.
>
> 2) When you do a search you can do "field:(Keyword1 Keyword2 Keyword3
> Keyword4) and you get as results any document that matches one of
> those. Not sure about 1000 of them in one go, but certainly a large
> number.
>
> On the other hand, if you have same keywords all the time and you are
> trying to match documents against them, you might be more interested
> in Elastic Search's percolator
> (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
> ) or in Luwak (https://github.com/flaxsearch/luwak).
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Fri, Mar 28, 2014 at 10:05 AM, Saurabh Agarwal
>  wrote:
>> Thanks a lot Alex for your reply, Appreciate the same.
>>
>> So if i leave the line no part.
>> 1. I guess putting pdf/word  in solr for search can be done, These
>> documents will go go in solr.
>> 2. For search any automatic way to give a excel sheet or large search
>> keywords to search for .
>> ie i have 1000's of words that i want to search in doc can i do it
>> collectively or send search queries one by one.
>>
>> Thanks
>> Saurabh
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch
>>  wrote:
>>> This feels somewhat backwards. It's very hard to extract Line-Number
>>> information out of MSWord and next to impossible from PDF. So, it's
>>> not whether the Solr is a good fit or not here is that maybe your
>>> whole architecture has a major issue. Can you do this/what you want by
>>> hand at least once? Down to the precision you want?
>>>
>>> If you can, then yes you probably can automate the searching with
>>> Solr, though you will still have serious issues (sentence crossing
>>> line-boundaries, etc). But I suspect your whole approach will change
>>> once you try to do this manually.
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>>> proficiency
>>>
>>>
>>> On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal
>>>  wrote:
 Can anyone help me please.

 Hi All,

 I am  new to Solr and from initial reading i am quite convinced Solr
 will be of great help. Can anyone help in making that decision.

 Usecase:
 1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
 which kinds of get overwritten frequently.
 2. I have a dictionary kind of thing ( having a list of which
 words/small sentences should be part of above docs , words which
 cannot be and alternatives for some  ).
 3. Now i want Solr to search my Docs produced in step 1 to be searched
 for words/small sentences from step 2 and give me my Doc Name/line no
 in which they exist.

 Will Solr be a good help to me, If anybody can help giving some
 examples that will be great.

 Appreciate your help and patience.

 Thanks
 Saurabh


What is Overseer?

2014-03-31 Thread Chris W
What is the role of an overseer in solrcloud? The documentation does not
offer full details about it. What if an overseer node goes down?

-- 
Best
-- 
C


Re: Request for adding to Contributors Group

2014-03-31 Thread Steve Rowe
Aditya, 

I’ve added your username to the Solr ContributorsGroup page, so you should now 
be able to edit wiki pages.

Steve

On Mar 31, 2014, at 1:25 PM, Aditya Choudhuri  wrote:

> Hello!
> 
> Please add my email and SolrWiki account in the ContributorsGroup.
> 
> My Wiki name = AdityaChoudhuri 
> 
> 
> Thank you.
> Aditya
> 
> 
> 



Request for adding to Contributors Group

2014-03-31 Thread Aditya Choudhuri

Hello!

Please add my email and SolrWiki account in the ContributorsGroup.

My Wiki name = AdityaChoudhuri 




Thank you.
Aditya





Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey

On 3/31/2014 9:03 AM, elisabeth benoit wrote:

We use JVisualVM. The CPU usage is very high (90%), but the GC activity
shows less than 0.01% average activity. Plus the heap usage stays low
(below 4G while the max heap size is 16G).

Do you have a different tool to suggest to check the GC? Do you think there
is something else me might not see?


You can't get actual usable GC pause information from jvisualvm or 
jconsole, only totals and averages.  Those tools seem to be geared more 
towards seeing problems when your heap is too small.


To see real pause information, you can turn on GC logging and then run 
the log through a tool like GCLogViewer to see a graph of your 
collection pauses.  What I used to initially see the problem was a 
program called jHiccup, which will show *ANY* pause, not just those 
caused by garbage collection.  GC is almost always the reason there is a 
pause, though.


http://www.azulsystems.com/jHiccup
https://code.google.com/p/gclogviewer/

You can still have long GC pauses even if your max heap isn't reached.

Have you provided any GC-related options to your JVM at all?  With a 
heap size of 4GB and a max heap of 16GB, I can absolutely guarantee that 
you will have pause problems unless you provide the JVM with a number of 
tuning options.  I was frequently having pauses as high as 10 to 12 
seconds on an 8GB heap, even after I switched to CMS.  Further tuning 
was required.  These options made the situation a lot better, but I 
think they can probably be improved:


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

More expanded info, which references the link above:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

Thanks,
Shawn



RE: setting up solr on tomcat

2014-03-31 Thread Lieberman, Ariel
In Tomcat 7 there was a bug with resolving URLs ending in "/". This should be 
fixed in Tomcat 7.0.5+, see SOLR-2022 for full details.

-Original Message-
From: Pradeep Pujari [mailto:prade...@rocketmail.com] 
Sent: Monday, March 24, 2014 4:02 AM
To: solr-user@lucene.apache.org
Subject: Re: setting up solr on tomcat

What is the exception stack trace? The link looks good and works for Solr4.x



 From: Michael Sokolov 
To: solr-user@lucene.apache.org 
Sent: Sunday, March 23, 2014 7:56 AM
Subject: Re: setting up solr on tomcat
 

On 3/22/2014 2:16 AM, anupamk wrote:
> Hi,
>
> Is the solrTomcat wiki article valid for solr-4.7.0 ?
> http://wiki.apache.org/solr/SolrTomcat
>
>
> I am not able to deploy solr after following the instructions there.
>
> When I try to access the solr admin page I get a 404.
>
> I followed every step exactly as mentioned in the wiki, still no dice.
>
> Any ideas ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/setting-up-solr-on-tomcat-tp4126177.html
> Sent from the Solr - User mailing list archive at Nabble.com.
There was a note on that page saying:

Solr4.3 requires completely 
different deployment. These instructions are*not*current and are for an 
indeterminate version of Solr.

I haven't read the instructions in detail, but in my experience setting 
up a single standalone server goes like this:

copy solr.war to the tomcat/webapps folder,
logging jars (log4j, slf4j) and configuration (log4j.properties) to the 
tomcat/lib folder

you can create your solr home directory directly in the tomcat folder -- 
if you do that, it should be found, or you can put it somewhere else and 
start the jvm with -Dsolr.solr.home=/wherever/you/put/solr

that's pretty much it, I think.  You will see the solr admin at 
http://localhost:8080/solr if you use all vanilla settings.

-Mike
This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries.
The information is intended to be for the use of the individual(s) or 
entity(ies) named above.
If you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient),
you may not use, copy, disclose or distribute to anyone this message or any 
information contained in this message.
If you have received this electronic message in error, please notify us by 
replying to this e-mail.



Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello,

Thanks for your answer.

We use JVisualVM. The CPU usage is very high (90%), but the GC activity
shows less than 0.01% average activity. Plus the heap usage stays low
(below 4G while the max heap size is 16G).

Do you have a different tool to suggest to check the GC? Do you think there
is something else me might not see?

Thanks again,
Elisabeth


2014-03-31 16:26 GMT+02:00 Shawn Heisey :

> On 3/31/2014 6:57 AM, elisabeth benoit wrote:
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
>
> That seems very odd.  I have one production copy of my index using
> 4.2.1, and it has been working fine for quite a long time.  We are
> transitioning to Solr 4.6.1 now, so the other copy is running that
> version.  We do occasionally do a full rebuild, but that is for index
> content, not for any problems.
>
> When you say you checked your garbage collector, what tools did you use?
>  I was having GC pause problems, but I didn't know it until I started
> using different tools.
>
> Thanks,
> Shawn
>
>


Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-31 Thread Rishi Easwaran
The SSD is separated into logical volumes..each instance gets 100 GB SSD disk 
space to write its index.
If I add them all up its ~45GB in 1TB SSD disk space. 
Not sure I get " You should not be running more than one instance of Solr per 
machine.One instance of Solr can run multiple indexes. "   
Yeah I know that, we have been running 6-8 instances of SOLR using multicore 
ability since ~2008, supporting millions of small indexes. 
Now we are looking at SOLR cloud with large indexes to see if we can leverage 
some of its benefits.
As many folks have experienced, JVM with its stop the world pauses, cannot GC 
using CMS within acceptable limits with very large heaps. 
To utilize the H/W to its full potential, multiple instances on a single host 
is pretty common practice for us. 


 

 

 

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Sun, Mar 30, 2014 5:51 pm
Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2


On 3/30/2014 2:59 PM, Rishi Easwaran wrote:
> RAM shouldn't be a problem. 
> I have a box with 144GB RAM, running 12 instances with 4GB Java heap each.
> There are 9 instances wrting to 1TB of SSD disk space. 
>  Other 3 are writing to SATA drives, and have autosoftcommit disabled.

This brought up more questions than it answered.  I was assuming that
you only had a total of 4GB of index data, but after reading this, I
think my assumption may be incorrect.  If you add up all the Solr index
data on the SSD, how much disk space does it take?

You should not be running more than one instance of Solr per machine.
One instance of Solr can run multiple indexes.  Running more than one
results in quite a lot of overhead, and it seems unlikely that you would
need to dedicate 48GB of total RAM to the Java heap.

Thanks,
Shawn


 


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey
On 3/31/2014 6:57 AM, elisabeth benoit wrote:
> We are currently using solr 4.2.1. Our index is updated on a daily basis.
> After noticing solr query time has increased (two times the initial size)
> without any change in index size or in solr configuration, we tried an
> optimize on the index but it didn't fix our problem. We checked the garbage
> collector, but everything seemed fine. What did in fact fix our problem was
> to delete all documents and reindex from scratch.
> 
> It looks like over time our index gets "corrupted" and optimize doesn't fix
> it. Does anyone have a clue how to investigate further this situation?

That seems very odd.  I have one production copy of my index using
4.2.1, and it has been working fine for quite a long time.  We are
transitioning to Solr 4.6.1 now, so the other copy is running that
version.  We do occasionally do a full rebuild, but that is for index
content, not for any problems.

When you say you checked your garbage collector, what tools did you use?
 I was having GC pause problems, but I didn't know it until I started
using different tools.

Thanks,
Shawn



Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-03-31 Thread Luis Lebolo
Hi Salman,

I was interested in something similar, take a look at the following thread:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCADSoL-i04aYrsOo2%3DGcaFqsQ3mViF%2Bhn24ArDtT%3D7kpALtVHzA%40mail.gmail.com%3E#archives

I never followed through, however.

-Luis


On Mon, Mar 31, 2014 at 6:24 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> Anyone?
>
>
> On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > With reference to this thread<
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E>I
> wanted to know if there was any response to that or if Chris Harris
> > himself can comment on what he ended up doing, that would be great!
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
> >
>
>
> --
> Regards,
>
> Salman Akram
>


solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello,

We are currently using solr 4.2.1. Our index is updated on a daily basis.
After noticing solr query time has increased (two times the initial size)
without any change in index size or in solr configuration, we tried an
optimize on the index but it didn't fix our problem. We checked the garbage
collector, but everything seemed fine. What did in fact fix our problem was
to delete all documents and reindex from scratch.

It looks like over time our index gets "corrupted" and optimize doesn't fix
it. Does anyone have a clue how to investigate further this situation?


Elisabeth


Re: get sub-facets based on main-facet selections

2014-03-31 Thread Erick Erickson
Have you looked at "pivot facets"? It _might_ help here with the
first part. That said, pivot facets can be expensive (as always,
"it depends") and the two-query solution might be better, gotta
test.

About the second part:
bq: one of my main facets returns with just a single value

Not sure how that'd work without two queries.

Best,
Erick

On Mon, Mar 31, 2014 at 5:46 AM, Jan Verweij - Reeleez  wrote:
> Dear,
>
> I'm implementing a productcatalog and have 5 main facets and 60+ possible
> subfacets.
> If I select a specific value from one of my main facets, let's say,
> productgroupX,
> I want to show the facets related to this productgroup, say length and
> height.
> But if productgroupY is selected I have to show weight and color.
>
> To make it even more complex if I run a query and one of my main facets
> returns with just a single value it's the same as selecting this single
> value and should already come back with the additional subfacets.
>
> I know how todo this with two requests to solr but perhaps there are more
> dynamic ways within solr I haven't thought about.
>
> Cheers,
>
> Jan Verweij


Re: Multiple Languages in Same Core

2014-03-31 Thread Jeremy Thomerson
Thanks Trey! Last week I ordered the eBook. I look forward to seeing the
information in it.

Jeremy


On Thu, Mar 27, 2014 at 6:03 PM, Trey Grainger  wrote:

> In addition to the two approaches Liu Bo mentioned (separate core per
> language and separate field per language), it is also possible to put
> multiple languages in a single field. This saves you the overhead of
> multiple cores and of having to search across multiple fields at query
> time. The idea here is that you can run multiple analyzers (i.e. one for
> German, one for English, one for Chinese, etc.) and stack the outputted
> TokenStreams for each of these within a single field. It is also possible
> to swap out the languages you want to use on a case-by-case basis (i.e.
> per-document, per field, or even per word) if you really need to for
> advanced use cases.
>
> All three of these methods, including code examples and the pros and cons
> of each are discussed in the Multilingual Search chapter of Solr in Action,
> which Alexandre referenced. If you don't have the book, you can also just
> download and run the code examples for free, though they may be harder to
> follow without the context from the book.
>
> Thanks,
>
> Trey Grainger
> Co-author, Solr in Action
> Director of Engineering, Search & Analytics @CareerBuilder
>
>
>
>
>
> On Wed, Mar 26, 2014 at 4:34 AM, Liu Bo  wrote:
>
> > Hi Jeremy
> >
> > There're a lot of multi language discussions, two main approaches
> >  1. like yours, a language is one core
> >  2. all in one core, different language has it's own field.
> >
> > We have multi-language support in a single core, each multilingual field
> > has it's own suffix such as name_en_US. We customized query handler to
> hide
> > the query details to client.
> > The main reason we want to do this is about NRT index and search,
> > take product for example:
> >
> > product has price, quantity which is common and it's used by
> filtering
> > and sorting, name, description is multi language field,
> > if we split product in do different cores, the common field updating
> > may end up a update in all of the multi language cores.
> >
> > As to scalability, we don't change solr cores/collections when a new
> > language is added, but we probably need update our customized index
> process
> > and run a full re-index.
> >
> > This approach suits our requirement for now, but you may have your own
> > concerns.
> >
> > We have similar "suggest filter" problem like yours, we want to return
> > suggest result filtering by stores. I can't find a way to build
> dictionary
> > with query at my version of solr 4.6
> >
> > What I do is run a query on a N-Gram analyzed field and with filter
> queries
> > on store_id field. The "suggest" is actually a query. It may not perform
> as
> > well as suggestion but can do the trick.
> >
> > You can try it to build a additional N-GRAM field for suggestion only and
> > search on it with fq on your "Locale" field.
> >
> > All the best
> >
> > Liu Bo
> >
> >
> >
> >
> > On 25 March 2014 09:15, Alexandre Rafalovitch 
> wrote:
> >
> > > Solr In Action has a significant discussion on the multi-lingual
> > > approach. They also have some code samples out there. Might be worth a
> > > look
> > >
> > > Regards,
> > >Alex.
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all
> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> > >
> > >
> > > On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
> > >  wrote:
> > > > I recently deployed Solr to back the site search feature of a site I
> > work
> > > > on. The site itself is available in hundreds of languages. With the
> > > initial
> > > > release of site search we have enabled the feature for ten of those
> > > > languages. This is distributed across eight cores, with two Chinese
> > > > languages plus Korean combined into one CJK core and each of the
> other
> > > > seven languages in their own individual cores. The reason for
> splitting
> > > > these into separate cores was so that we could have the same field
> > names
> > > > across all cores but have different configuration for analyzers, etc,
> > per
> > > > core.
> > > >
> > > > Now I have some questions on this approach.
> > > >
> > > > 1) Scalability: Considering I need to scale this to many dozens more
> > > > languages, perhaps hundreds more, is there a better way so that I
> don't
> > > end
> > > > up needing dozens or hundreds of cores? My initial plan was that many
> > > > languages that didn't have special support within Solr would simply
> get
> > > > lumped into a single "default" core that has some default analyzers
> > that
> > > > are applicable to the majority of languages.
> > > >
> > > > 1b) Related to this: is there a practical limit to the number of
> cores
> > > that
> > > > can be run on one instance of Lucen

Re: Product index schema for solr

2014-03-31 Thread Ajay Patel

Hi Erick
Thank for the reply :). your solution help me to denormalize my data. 
now i have one another question can i create a generalize range facet 
according to min_qty and max_qty?


Thanks & Regards
Ajay Patel.

On Saturday 29 March 2014 08:54 PM, Erick Erickson wrote:

The usual approach is to de-normalize the tables, so you'd store docs like
(all your product data) min_qty, max_qty, price_per_qty

So the above example would have 4 documents, then it all "just works"

You have to insure that the id () is different for each, and
probably store the product ID in a field other than "id" for this reason.

Best,
Erick

On Fri, Mar 28, 2014 at 10:27 AM, Ajay Patel  wrote:


Hi Solr user & developers.

i am new in the world of solr search engine. i have a complex product
database structure in postgres.

Product has many product_quantity_price attrbutes in range

For e.g Product iD 1 price range is stored in product_quantity_price
table in following manner.

min_qty max_qty price_per_qty
1504
51  100  3.5
1011503
151200  2.5

the range is not fixed for any product it can be different for different
product.

now my question is that how can i save this data in solr in optimized
way so that i can create facets on qty and prices.

Thanks in advance.
Ajay Patel.













Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-03-31 Thread Salman Akram
Anyone?


On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> With reference to this 
> threadI
>  wanted to know if there was any response to that or if Chris Harris
> himself can comment on what he ended up doing, that would be great!
>
>
> --
> Regards,
>
> Salman Akram
>
>


-- 
Regards,

Salman Akram


Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky

So, how big is the discrepancy?

If you do a *:* query for rows=100, is the 100th result the same for both?

Do a bunch of random queries and see if you can find a document key that is 
missing from one core, but present in the other, and check if it should have 
been deleted.


Are you deleting by "id" or by "query"?

Do you do an explicit commit on your update request? If not, it could just 
take a few minutes before the commit actually occurs.


Are the two Solr servers on the same machine or different machines? If the 
latter, is one of the machines significantly faster than the other.


-- Jack Krupansky

-Original Message- 
From: abhishek.netj...@gmail.com

Sent: Monday, March 31, 2014 5:48 AM
To: solr-user@lucene.apache.org ; solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Hi,
These settings are commented in schema. These are two different solr severs 
and almost identical schema ‎with the exception of one stemmed field.


Same solr versions are running.
Please help.

Thanks
Abhishek

 Original Message
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



get sub-facets based on main-facet selections

2014-03-31 Thread Jan Verweij - Reeleez
Dear,

I'm implementing a productcatalog and have 5 main facets and 60+ possible
subfacets.
If I select a specific value from one of my main facets, let's say,
productgroupX,
I want to show the facets related to this productgroup, say length and
height.
But if productgroupY is selected I have to show weight and color.

To make it even more complex if I run a query and one of my main facets
returns with just a single value it's the same as selecting this single
value and should already come back with the additional subfacets.

I know how todo this with two requests to solr but perhaps there are more
dynamic ways within solr I haven't thought about.

Cheers,

Jan Verweij


Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain
Hi,
These settings are commented in schema. These are two different solr severs and 
almost identical schema ‎with the exception of one stemmed field.

Same solr versions are running.
Please help.

Thanks 
Abhishek

  Original Message  
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain
Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek 



Re: eDismax parser and the mm parameter

2014-03-31 Thread Jack Krupansky
The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR (the 
default) and ignore the mm parameter. Give pf the highest boost, and boost 
pf3 higher than pf2.


You could try using the complex phrase query parser for the third case.

-- Jack Krupansky

-Original Message- 
From: S.L

Sent: Monday, March 31, 2014 12:08 AM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Thanks Jack , my use cases are as follows.


  1. Search for "Ginseng" everything related to ginseng should show up.
  2. Search For "White Siberian Ginseng" results with the whole phrase
  show up first followed by 2 words from the phrase followed by a single 
word

  in the phrase
  3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
  documents with White Siberian Ginseng Should show up , this looks like 
the
  most complicated of all as Solr does not support fuzzy phrase searches . 
(I

  have no solution for this yet).

Thanks again!


On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky 
wrote:



The mm parameter is really only relevant when the default operator is OR
or explicit OR operators are used.

Again: Please provide your use case examples and your expectations for
each use case. It really doesn't make a lot of sense to prematurely focus
on a solution when you haven't clearly defined your use cases.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 9:13 PM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Jack,

I mis-stated the problem , I am not using the OR operator as default
now(now that I think about it it does not make sense to use the default
operator OR along with the mm parameter) , the reason I want to use pf and
mm in conjunction is because of my understanding of the edismax parser and
I have not looked into pf2 and pf3 parameters yet.

I will state my understanding here below.

Pf -  Is used to boost the result score if the complete phrase matches.
mm <(less than) search term length would help limit the query results  to 
a

certain number of better matches.

With that being said would it make sense to have dynamic mm (set to the
length of search term - 1)?

I also have a question around using a fuzzy search along with eDismax
parser , but I will ask that in a seperate post once I go thru that aspect
of eDismax parser.

Thanks again !





On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky 
wrote:

 If you use pf, pf2, and pf3 and boost appropriately, the effects of mm

will be dwarfed.

The general goal is to assure that the top documents really are the best,
not to necessarily limit the total document count. Focusing on the latter
could be a real waste of time.

It's still not clear why or how you need or want to use OR as the default
operator - you still haven't given us a use case for that.

To repeat: Give us a full set of use cases before taking this XY Problem
approach of pursuing a solution before the problem is understood.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 6:14 PM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Jacks Thanks Again,

I am searching  Chinese medicine  documents , as the example I gave
earlier
a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
Ginseng
, I certainly want to use pf parameter (which is not driven by mm
parameter) , however for giving higher score to documents that have more
of
the terms I want to use edismax now if I give a mm of 3 and the search
term
is of only length 1 (like "Ginseng") what does edisMax do ?


On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky 
wrote:

 It still depends on your objective - which you haven't told us yet. Show


us some use cases and detail what your expectations are for each use
case.

The edismax phrase boosting is probably a lot more useful than messing
around with mm. Take a look at pf, pf2, and pf3.

See:
http://wiki.apache.org/solr/ExtendedDisMax
https://cwiki.apache.org/confluence/display/solr/The+
Extended+DisMax+Query+Parser

The focus on mm may indeed be a classic "XY Problem" - a premature focus
on a solution without detailing the problem.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Thanks Jack! I understand the intent of mm parameter, my question is 
that

since the query terms being provided are not of fixed length I do not
know
what the mm should like for example "Ginseng","Siberian Ginseng" are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in 
my

query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky 
wrote:

 1. Yes, the default for mm is 1.


2. It depends on what you are really trying to do - you haven't told 
us.


Generally, mm=1 

Re: Unsuccessful queries for terms next to tabs and newlines in uploaded Word documents

2014-03-31 Thread Jack Krupansky
What field type and analyzer are you using? Normally, both the standard ad 
whitespace tokenizers will break tokens at all white space, which includes 
tabs.


Check your df and qf parameters to see that they are querying the 
attr_content field. Query the attr_content field directly, as a test.


-- Jack Krupansky

-Original Message- 
From: chtjfi

Sent: Monday, March 31, 2014 3:23 AM
To: solr-user@lucene.apache.org
Subject: Unsuccessful queries for terms next to tabs and newlines in 
uploaded Word documents


Short Version: What do I need to do to successfully query for terms that are
adjacent to tabs and newlines (i.e. \t, \n) in an uploaded Word document?

Long Version:

I am using Solr 4.6.1. I am running an unmodified version of the example
core that is started by running java -jar start.jar in the example
directory. The schema.xml in use is example/solr/collection1/conf/schema.xml
and is unmodified (it is the one downloaded with the distribution), so I
won't post it unless someone says it is helpful.

After uploading a Word document to Solr with the command
http://localhost:8983/solr/update/extract?literal.id=yabba&uprefix=attr_&fmap.content=attr_content&commit=true
there are hundreds of tab and newline characters (i.e. \n and \t) in the
attr_content field. When a string occurs only once in the document, and is
adjacent to one of these characters, queries for that term are not
successful.

A specific example is an uploaded Word document that after upload contains
"Vorname:\t\t\tYasmin" in the attr_content field. The original document
contained "Vorname:", then two tab characters, then "Yasmin" (the string
"\t" does not appear in the document). The string "Yasmin" appears only in
that location in the document.

When I query for "Yasmin" with the query
http://127.0.0.1:8983/solr/collection1/select?q=Yasmin&wt=json&indent=true I
get no results. Queries for terms that are not next to a \t or a \n are
successful.

What can I do so that a query for a term next to a tab or newline will be
successful? Must I change the way the document is uploaded? Or change the
way the search is performed?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unsuccessful-queries-for-terms-next-to-tabs-and-newlines-in-uploaded-Word-documents-tp4128090.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-31 Thread rulinma
I think that maybe my problem with cluster. I will adjust to test again. 3X!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergingSolrIndexes-not-supported-by-SolrCloud-why-tp4127111p4128113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky
Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?


Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?


How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



Re: Expected date of release for Solr 4.7.1

2014-03-31 Thread Puneet Pawaia
Thanks for the update, Mike.

Regards
Puneet


On Sat, Mar 29, 2014 at 11:58 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> RC2 is being voted on now ... so it should be "soon" (a few days, but
> more if any new blocker issues are found and we need to do RC3).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Mar 29, 2014 at 2:26 PM, Puneet Pawaia 
> wrote:
> > Hi
> > Any idea on the expected date of release for Solr 4.7.1
> > Regards
> > Puneet
>


Strange behavior while deleting

2014-03-31 Thread abhishek jain
hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek


Unsuccessful queries for terms next to tabs and newlines in uploaded Word documents

2014-03-31 Thread chtjfi
Short Version: What do I need to do to successfully query for terms that are
adjacent to tabs and newlines (i.e. \t, \n) in an uploaded Word document?

Long Version:

I am using Solr 4.6.1. I am running an unmodified version of the example
core that is started by running java -jar start.jar in the example
directory. The schema.xml in use is example/solr/collection1/conf/schema.xml
and is unmodified (it is the one downloaded with the distribution), so I
won't post it unless someone says it is helpful.

After uploading a Word document to Solr with the command
http://localhost:8983/solr/update/extract?literal.id=yabba&uprefix=attr_&fmap.content=attr_content&commit=true
there are hundreds of tab and newline characters (i.e. \n and \t) in the
attr_content field. When a string occurs only once in the document, and is
adjacent to one of these characters, queries for that term are not
successful.

A specific example is an uploaded Word document that after upload contains
"Vorname:\t\t\tYasmin" in the attr_content field. The original document
contained "Vorname:", then two tab characters, then "Yasmin" (the string
"\t" does not appear in the document). The string "Yasmin" appears only in
that location in the document.

When I query for "Yasmin" with the query
http://127.0.0.1:8983/solr/collection1/select?q=Yasmin&wt=json&indent=true I
get no results. Queries for terms that are not next to a \t or a \n are
successful.

What can I do so that a query for a term next to a tab or newline will be
successful? Must I change the way the document is uploaded? Or change the
way the search is performed?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unsuccessful-queries-for-terms-next-to-tabs-and-newlines-in-uploaded-Word-documents-tp4128090.html
Sent from the Solr - User mailing list archive at Nabble.com.