Re: synonym file

2012-08-02 Thread Lance Norskog
If you must have them a query time, you need a custom implementation
for very very large files :) If you can use these synonyms at index
time instead of query time, that would help. When you index, do not
call commit very often.

The synonym filter implementation has a feature where it only saves
the first of a set of synonyms, so you don't get term explosion.

May I ask what is the use case?

On Thu, Aug 2, 2012 at 9:56 PM, Peyman Faratin  wrote:
> Hi
>
> I have a (23M) synonym file that takes a long time (3 or so minutes) to load 
> and once included seems to adversely affect the QTime of the application by 
> approximately 4 orders of magnitude.
>
> Any advise on how to load faster and lower the QT would be much appreciated.
>
> best
>
> Peyman



-- 
Lance Norskog
goks...@gmail.com


synonym file

2012-08-02 Thread Peyman Faratin
Hi

I have a (23M) synonym file that takes a long time (3 or so minutes) to load 
and once included seems to adversely affect the QTime of the application by 
approximately 4 orders of magnitude. 

Any advise on how to load faster and lower the QT would be much appreciated. 

best

Peyman

Re: Trending topics?

2012-08-02 Thread Hasan Diwan
Tor,
I hope that the information in
http://www.jason-palmer.com/2011/05/creating-a-tag-cloud-with-solr-and-php/
helps.. -- H
On 2 August 2012 15:48, Lance Norskog  wrote:

> Two easy ones:
> 1) Facets on a text field are simple word counts by document.
> 2) If you want the number of times a word appears inside a document,
> that requires a separate dataset called a 'term vector'. This is a
> list of all words in a document with a count for each one.
> These are simple queries. There are also batch computations where you
> create a 'term-document matrix', with a row for each document and a
> column for all terms that appear in any document. These computations
> require exporting all of your data into a separate computation.
>
>
>
> On Thu, Aug 2, 2012 at 1:26 PM, Chris Dawson  wrote:
> > Tor,
> >
> > Thanks for your response.
> >
> > I'd like to put an arbitrary set of text into Solr and then have Solr
> tell
> > me the ten most popular "topics" that are in there.  For example, if I
> put
> > in 100 paragraphs of text about sports, I would like to retrieve topics
> > like "swimming, basketball, tennis" if the three most popular and
> discussed
> > topics are those inside the text.
> >
> > Is Solr the correct tool to do something like this?  Or, is this too
> > unstructured to get this kind of result without manually categorizing it?
> >
> > Is the correct term for this faceting?  It seems to me that faceting
> > requires putting the data into a more structured format (for example,
> > telling the index that this is the "manufacturer", etc.)
> >
> > Basically, I would like to get something like a tag cloud (relevant
> topics
> > with weights for each term) without asking users to tag things manually.
> >
> > Chris
> >
> > On Thu, Aug 2, 2012 at 3:25 PM, Tor Henning Ueland <
> tor.henn...@gmail.com>wrote:
> >
> >> On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson 
> wrote:
> >>
> >> > How would I generate a list of trending topics using solr?
> >> >
> >>
> >>
> >> By putting them in solr.
> >> (Generic question get at generic answer)
> >>
> >> What do you mean? Trending searches, trending data, trending documents,
> >> trending what?
> >>
> >>
> >> --
> >> Regards
> >> Tor Henning Ueland
> >>
> >
> >
> >
> > --
> > Chris Dawson
> > 971-533-8335
> > Human potential, travel and entrepreneurship:  http://webiphany.com/
> > Traveling to Portland, OR?  http://www.airbnb.com/rooms/58909
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Sent from my mobile device
Envoyait de mon portable


AW: Special suggestions requirement

2012-08-02 Thread Lochschmied, Alexander
Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: 
EdgeNGramFilterFactory is used on the field we are getting the 
suggestions/spellchecks from.
I think the problem is that there are a lot of different part numbers starting 
with "ABCD" and every part number has the same length. I showed only 4 in the 
example but there might be thousands.

Here are some full part number examples that might be in the index:
ABCD110040
ABCD00
ABCD99
ABCD155500
...

I'm looking for a way to make Solr return distinct list of fixed length 
substrings of them, e.g. if "ABCD" is entered, I would need
ABCD00
ABCD01
ABCD02
ABCD03
...
ABCD99

Then if user chose "ABCD42" from the suggestions, I would need
ABCD4201
ABCD4202
ABCD4203
...
ABCD4299

and so on.

I would be able to do some "post processing" if needed or adjust the schema or 
indexing process. But the key functionality I need from Solr is returning 
distinct set of those suggestions where only the last two characters change. 
All of the available combinations of those last two characters must be 
considered though. I need to show alpha-numerically sorted suggestions; the 
smallest value first.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Gesendet: Donnerstag, 2. August 2012 15:02
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

In this case, we're storing the overall value length and sorting it on that, 
then alphabetically.

Also, how are your queries fashioned? If you're doing a prefix query, 
everything that matches it should score the same. If you're only doing a prefix 
query, you might need to add a term for exact matches as well to get them to 
show up.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com 
Where Influence Isn't a Game


On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander 
 wrote:
> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>
> I am trying to suggest part numbers and I'm currently trying to do it with 
> the spellchecker component.
> Let's say "ABCD" was entered and we have indexed part numbers like
> ABCD
> ABCD2000
> ABCD2100
> ABCD2200
> ...
>
> I would like to have 2 characters suggested always, so for "ABCD", it 
> should suggest
> ABCD00
> ABCD20
> ABCD21
> ABCD22
> ...
>
> No smart sorting is needed, just alphabetically sorting. The problem is that 
> for example 00 (or ABCD00) may not be suggested currently as it doesn't score 
> high enough. But we are really trying to get all distinct values starting 
> from the smallest (up to a certain number of suggestions).
>
> I was looking already at custom comparator class option. But this would 
> probably not work as I would need more information to implement it there 
> (like at least the currently entered search term, "ABCD" in the example).
>
> Thanks,
> Alexander


Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller
FYI: I've committed the rest of the work I was doing on trunk in this area.

On Aug 2, 2012, at 4:42 PM, Timothy Potter  wrote:

> Yes, I can but won't get to it today unfortunately. I had my eval
> environment running on some very expensive EC2 instances and shut it
> down for the time being until I can focus on it again. Will try to get
> back to this either tomorrow or over the weekend. Sorry for the delay.
> 
> Tim
> 
> On Thu, Aug 2, 2012 at 1:35 PM, Mark Miller  wrote:
>> Can you do me a favor and try not using the batch add for a run?
>> 
>> Just do the add one doc at a time. (solrServer.add(doc) rather than 
>> solrServer.add(collection))
>> 
>> I just fixed one issue with it this morning on trunk - it may be the cause 
>> of this oddity.
>> 
>> I'm also working on some performance issues around that method too (good 
>> performance without starting thousands of threads).
>> 
>> Until I get all that straightened out (hopefully very soon), I think you 
>> will have better luck not using the bulk, collection add method.
>> 
>> On Aug 2, 2012, at 2:16 PM, Timothy Potter  wrote:
>> 
>>> Thanks Mark.
>>> 
>>> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
>>> 
>>> Collection batch = ...
>>> ... build up batch ...
>>> solrServer.add( batch );
>>> 
>>> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
>>> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
>>> is that I couldn't get it to run in my Hadoop environment. There's
>>> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
>>> 4.1.3 but when I run it in my env, I get the following:
>>> 
>>> Caused by: java.lang.NoSuchMethodError:
>>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
>>> ()V not found
>>>  at 
>>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
>>>  at 
>>> org.apache.solr.client.solrj.impl.CloudSolrServer.(CloudSolrServer.java:70)
>>>  ... 16 more
>>> 
>>> I spent hours trying to resolve the classpath issue and finally had to
>>> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
>>> stage at this point. So it sounds like this could be the cause of my
>>> problems.
>>> 
>>> One other thing ... I do have the _version_ field defined in my
>>> schema.xml but am not setting it on the client side when indexing.
>>> Should I be doing that?
>>> 
>>> Cheers,
>>> Tim
>>> 
>>> 
>>> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller  wrote:
 
 On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:
 
> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
> impressed so far ...
> 
> I have a 12-shard index with ~104M docs with each shard having
> 1-replica (so 24 Solr servers running)
> 
> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
> (*:*) and each time I send the request the value for numFound in the
> result is different. It's always close but not exactly the same as I
> would expect? Can anyone shed some light on this issue? I also tried a
> real query, such as "#olympics lochte" and same thing - different
> numFound each time. The first page of actual docs returned is the same
> so maybe I should just ignore the numFound issue?
> 
> Note that while experiencing this behavior, I am not adding any docs
> to the index and all docs have been committed with waitFlush=true and
> waitSearcher=true on the commit. Also, not doing soft commits at this
> point. In addition, after having committed all 104M docs, I hit the
> optimize button the panel so I have only 1 segment. In other words,
> the index is not being updated and has been optimized at this point.
 
 
 How are you adding docs? Eg what client and what method in particular 
 (what is your line of code that actually adds the doc).
 
 You can find the numFound result for each node by passing the param 
 distrib=false. What does this tell you? Are your replicas in sync with the 
 leader? What does the count for each shard add up to?
 
 I would not ignore the issue - something must be off. It may somehow be 
 user error, it may be a bug that has been fixed since the alpha, or it may 
 be something new.
 
 Are you sure every shard you are issuing the query *from* is active and 
 live according to ZooKeeper? Eg when you look at the cloud admin view and 
 look at the cluster visualization, are all the nodes green?
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: Trending topics?

2012-08-02 Thread Lance Norskog
Two easy ones:
1) Facets on a text field are simple word counts by document.
2) If you want the number of times a word appears inside a document,
that requires a separate dataset called a 'term vector'. This is a
list of all words in a document with a count for each one.
These are simple queries. There are also batch computations where you
create a 'term-document matrix', with a row for each document and a
column for all terms that appear in any document. These computations
require exporting all of your data into a separate computation.



On Thu, Aug 2, 2012 at 1:26 PM, Chris Dawson  wrote:
> Tor,
>
> Thanks for your response.
>
> I'd like to put an arbitrary set of text into Solr and then have Solr tell
> me the ten most popular "topics" that are in there.  For example, if I put
> in 100 paragraphs of text about sports, I would like to retrieve topics
> like "swimming, basketball, tennis" if the three most popular and discussed
> topics are those inside the text.
>
> Is Solr the correct tool to do something like this?  Or, is this too
> unstructured to get this kind of result without manually categorizing it?
>
> Is the correct term for this faceting?  It seems to me that faceting
> requires putting the data into a more structured format (for example,
> telling the index that this is the "manufacturer", etc.)
>
> Basically, I would like to get something like a tag cloud (relevant topics
> with weights for each term) without asking users to tag things manually.
>
> Chris
>
> On Thu, Aug 2, 2012 at 3:25 PM, Tor Henning Ueland 
> wrote:
>
>> On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson  wrote:
>>
>> > How would I generate a list of trending topics using solr?
>> >
>>
>>
>> By putting them in solr.
>> (Generic question get at generic answer)
>>
>> What do you mean? Trending searches, trending data, trending documents,
>> trending what?
>>
>>
>> --
>> Regards
>> Tor Henning Ueland
>>
>
>
>
> --
> Chris Dawson
> 971-533-8335
> Human potential, travel and entrepreneurship:  http://webiphany.com/
> Traveling to Portland, OR?  http://www.airbnb.com/rooms/58909



-- 
Lance Norskog
goks...@gmail.com


How do you get the document name from Open Text?

2012-08-02 Thread eShard
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. 
I have the cats/atts turned on in Open Text and I can see them all in the
Solr index.
However, the id is just the URL to download the doc from open text and the
document name either from Open Text or the document properties is nowhere to
be found.
I tried using resourceName in the solrconfig.xml as it was described in the
manual but it doesn't work.
I used this:


  text
  last_modified
  attr_
  File Name
  true



  

but all I get is "File Name" in resourceName. Should I leave the value blank
or is there some other field I should use?
Please advise



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-you-get-the-document-name-from-Open-Text-tp3998908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Yes, I can but won't get to it today unfortunately. I had my eval
environment running on some very expensive EC2 instances and shut it
down for the time being until I can focus on it again. Will try to get
back to this either tomorrow or over the weekend. Sorry for the delay.

Tim

On Thu, Aug 2, 2012 at 1:35 PM, Mark Miller  wrote:
> Can you do me a favor and try not using the batch add for a run?
>
> Just do the add one doc at a time. (solrServer.add(doc) rather than 
> solrServer.add(collection))
>
> I just fixed one issue with it this morning on trunk - it may be the cause of 
> this oddity.
>
> I'm also working on some performance issues around that method too (good 
> performance without starting thousands of threads).
>
> Until I get all that straightened out (hopefully very soon), I think you will 
> have better luck not using the bulk, collection add method.
>
> On Aug 2, 2012, at 2:16 PM, Timothy Potter  wrote:
>
>> Thanks Mark.
>>
>> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
>>
>> Collection batch = ...
>> ... build up batch ...
>> solrServer.add( batch );
>>
>> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
>> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
>> is that I couldn't get it to run in my Hadoop environment. There's
>> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
>> 4.1.3 but when I run it in my env, I get the following:
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
>> ()V not found
>>   at 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
>>   at 
>> org.apache.solr.client.solrj.impl.CloudSolrServer.(CloudSolrServer.java:70)
>>   ... 16 more
>>
>> I spent hours trying to resolve the classpath issue and finally had to
>> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
>> stage at this point. So it sounds like this could be the cause of my
>> problems.
>>
>> One other thing ... I do have the _version_ field defined in my
>> schema.xml but am not setting it on the client side when indexing.
>> Should I be doing that?
>>
>> Cheers,
>> Tim
>>
>>
>> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller  wrote:
>>>
>>> On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:
>>>
 Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
 impressed so far ...

 I have a 12-shard index with ~104M docs with each shard having
 1-replica (so 24 Solr servers running)

 Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
 (*:*) and each time I send the request the value for numFound in the
 result is different. It's always close but not exactly the same as I
 would expect? Can anyone shed some light on this issue? I also tried a
 real query, such as "#olympics lochte" and same thing - different
 numFound each time. The first page of actual docs returned is the same
 so maybe I should just ignore the numFound issue?

 Note that while experiencing this behavior, I am not adding any docs
 to the index and all docs have been committed with waitFlush=true and
 waitSearcher=true on the commit. Also, not doing soft commits at this
 point. In addition, after having committed all 104M docs, I hit the
 optimize button the panel so I have only 1 segment. In other words,
 the index is not being updated and has been optimized at this point.
>>>
>>>
>>> How are you adding docs? Eg what client and what method in particular (what 
>>> is your line of code that actually adds the doc).
>>>
>>> You can find the numFound result for each node by passing the param 
>>> distrib=false. What does this tell you? Are your replicas in sync with the 
>>> leader? What does the count for each shard add up to?
>>>
>>> I would not ignore the issue - something must be off. It may somehow be 
>>> user error, it may be a bug that has been fixed since the alpha, or it may 
>>> be something new.
>>>
>>> Are you sure every shard you are issuing the query *from* is active and 
>>> live according to ZooKeeper? Eg when you look at the cloud admin view and 
>>> look at the cluster visualization, are all the nodes green?
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Trending topics?

2012-08-02 Thread Chris Dawson
Tor,

Thanks for your response.

I'd like to put an arbitrary set of text into Solr and then have Solr tell
me the ten most popular "topics" that are in there.  For example, if I put
in 100 paragraphs of text about sports, I would like to retrieve topics
like "swimming, basketball, tennis" if the three most popular and discussed
topics are those inside the text.

Is Solr the correct tool to do something like this?  Or, is this too
unstructured to get this kind of result without manually categorizing it?

Is the correct term for this faceting?  It seems to me that faceting
requires putting the data into a more structured format (for example,
telling the index that this is the "manufacturer", etc.)

Basically, I would like to get something like a tag cloud (relevant topics
with weights for each term) without asking users to tag things manually.

Chris

On Thu, Aug 2, 2012 at 3:25 PM, Tor Henning Ueland wrote:

> On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson  wrote:
>
> > How would I generate a list of trending topics using solr?
> >
>
>
> By putting them in solr.
> (Generic question get at generic answer)
>
> What do you mean? Trending searches, trending data, trending documents,
> trending what?
>
>
> --
> Regards
> Tor Henning Ueland
>



-- 
Chris Dawson
971-533-8335
Human potential, travel and entrepreneurship:  http://webiphany.com/
Traveling to Portland, OR?  http://www.airbnb.com/rooms/58909


Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-02 Thread david3s
Hello Jack,

We found that the problem is related to the *lucene* query parser in 3.6

select?q=author:David\ Duke&defType=lucene
Would render the same results as:
select?q=author:(David OR Duke)&defType=lucene

But
select?q=author:David\ Duke&defType=edismax
Would render the same results as:
select?q=author:"David Duke"&defType=lucene

Thanks A lot Jack



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998899.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 - Join performance

2012-08-02 Thread Mikhail Khludnev
Eric,

you can take last patch from SOLR-3076
 [image: Text File]

 SOLR-3076.patch

16/Jul/12 21:16

also can take it applied from
https://github.com/m-khl/solr-patches/tree/6611 . But the origin source
code might be a little bit old.
Regaining a nightly build, it's not so optimistic - I can't attract
committer for reviewing it.

On Thu, Aug 2, 2012 at 11:51 PM, Eric Khoury  wrote:

>  Wow, great work Mikhail, that's impressive.
> I don't currently have build the dev tree, you wouldn't have a patch for
> the alpha build handy?
> If not, when do you think this'll be available in a nightly build?
> Thanks again,
> Eric.
> > From: mkhlud...@griddynamics.com
> > Date: Thu, 2 Aug 2012 22:38:13 +0400
> > Subject: Re: Solr 4.0 - Join performance
> > To: solr-user@lucene.apache.org
>
> >
> > Hello,
> >
> > You can check my record.
> >
> https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644
> >
> > I'm still working on precise performance measurement.
> >
> > On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury 
> wrote:
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hello all,
> > >
> > >
> > >
> > > I’m testing out the new join feature, hitting some perf
> > > issues, as described in Erick’s article (
> > > http://architects.dzone.com/articles/solr-experimenting-join).
> > >
> > > Basically, I’m using 2 objects in solr (this is a simplified
> > > view):
> > >
> > >
> > >
> > > Item
> > >
> > > - Id
> > >
> > > - Name
> > >
> > >
> > >
> > > Grant
> > >
> > > - ItemId
> > >
> > > - AvailabilityStartTime
> > >
> > > - AvailabilityEndTime
> > >
> > >
> > >
> > > Each item can have multiple grants attached to it.
> > >
> > >
> > >
> > > The query I'm using is the following, to find items by
> > > name, filtered by grants availability window:
> > >
> > >
> > >
> > > solr/select?fq=Name:XXX&q={!join
> > > from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> > > -AvailabilityEndTime:[*
> > > TO NOW]
> > >
> > >
> > >
> > > With a hundred thousand items, this query can take multiple seconds
> > > to perform, due to the large number or ItemIds returned from the join
> > > query.
> > >
> > > Has anyone come up with a better way to use joins for these types of
> > > queries? Are there improvements planned in 4.0 rtm in this area?
> > >
> > >
> > >
> > > Btw, I’ve explored simply adding Start-End times to items, but
> > > the flat data model makes it hard to maintain start-end pairs.
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Eric.
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller
Can you do me a favor and try not using the batch add for a run?

Just do the add one doc at a time. (solrServer.add(doc) rather than 
solrServer.add(collection))

I just fixed one issue with it this morning on trunk - it may be the cause of 
this oddity.

I'm also working on some performance issues around that method too (good 
performance without starting thousands of threads).

Until I get all that straightened out (hopefully very soon), I think you will 
have better luck not using the bulk, collection add method.

On Aug 2, 2012, at 2:16 PM, Timothy Potter  wrote:

> Thanks Mark.
> 
> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
> 
> Collection batch = ...
> ... build up batch ...
> solrServer.add( batch );
> 
> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
> is that I couldn't get it to run in my Hadoop environment. There's
> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
> 4.1.3 but when I run it in my env, I get the following:
> 
> Caused by: java.lang.NoSuchMethodError:
> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
> ()V not found
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
>   at 
> org.apache.solr.client.solrj.impl.CloudSolrServer.(CloudSolrServer.java:70)
>   ... 16 more
> 
> I spent hours trying to resolve the classpath issue and finally had to
> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
> stage at this point. So it sounds like this could be the cause of my
> problems.
> 
> One other thing ... I do have the _version_ field defined in my
> schema.xml but am not setting it on the client side when indexing.
> Should I be doing that?
> 
> Cheers,
> Tim
> 
> 
> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller  wrote:
>> 
>> On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:
>> 
>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>>> impressed so far ...
>>> 
>>> I have a 12-shard index with ~104M docs with each shard having
>>> 1-replica (so 24 Solr servers running)
>>> 
>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>>> (*:*) and each time I send the request the value for numFound in the
>>> result is different. It's always close but not exactly the same as I
>>> would expect? Can anyone shed some light on this issue? I also tried a
>>> real query, such as "#olympics lochte" and same thing - different
>>> numFound each time. The first page of actual docs returned is the same
>>> so maybe I should just ignore the numFound issue?
>>> 
>>> Note that while experiencing this behavior, I am not adding any docs
>>> to the index and all docs have been committed with waitFlush=true and
>>> waitSearcher=true on the commit. Also, not doing soft commits at this
>>> point. In addition, after having committed all 104M docs, I hit the
>>> optimize button the panel so I have only 1 segment. In other words,
>>> the index is not being updated and has been optimized at this point.
>> 
>> 
>> How are you adding docs? Eg what client and what method in particular (what 
>> is your line of code that actually adds the doc).
>> 
>> You can find the numFound result for each node by passing the param 
>> distrib=false. What does this tell you? Are your replicas in sync with the 
>> leader? What does the count for each shard add up to?
>> 
>> I would not ignore the issue - something must be off. It may somehow be user 
>> error, it may be a bug that has been fixed since the alpha, or it may be 
>> something new.
>> 
>> Are you sure every shard you are issuing the query *from* is active and live 
>> according to ZooKeeper? Eg when you look at the cloud admin view and look at 
>> the cluster visualization, are all the nodes green?
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: Trending topics?

2012-08-02 Thread Tor Henning Ueland
On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson  wrote:

> How would I generate a list of trending topics using solr?
>


By putting them in solr.
(Generic question get at generic answer)

What do you mean? Trending searches, trending data, trending documents,
trending what?


-- 
Regards
Tor Henning Ueland


Re: Solr 4.0 - Join performance

2012-08-02 Thread Mikhail Khludnev
Hello,

You can check my record.
https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644

I'm still working on precise performance measurement.

On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury  wrote:

>
>
>
>
>
>
> Hello all,
>
>
>
> I’m testing out the new join feature, hitting some perf
> issues, as described in Erick’s article (
> http://architects.dzone.com/articles/solr-experimenting-join).
>
> Basically, I’m using 2 objects in solr (this is a simplified
> view):
>
>
>
> Item
>
> - Id
>
> - Name
>
>
>
> Grant
>
> - ItemId
>
> - AvailabilityStartTime
>
> - AvailabilityEndTime
>
>
>
> Each item can have multiple grants attached to it.
>
>
>
> The query I'm using is the following, to find items by
> name, filtered by grants availability window:
>
>
>
> solr/select?fq=Name:XXX&q={!join
> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> -AvailabilityEndTime:[*
> TO NOW]
>
>
>
> With a hundred thousand items, this query can take multiple seconds
> to perform, due to the large number or ItemIds returned from the join
> query.
>
> Has anyone come up with a better way to use joins for these types of
> queries?  Are there improvements planned in 4.0 rtm in this area?
>
>
>
> Btw, I’ve explored simply adding Start-End times to items, but
> the flat data model makes it hard to maintain start-end pairs.
>
>
>
> Thanks for the help!
>
> Eric.
>
>
>
>




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Map Complex Datastructure with Solr

2012-08-02 Thread Mikhail Khludnev
Tomas,

If you mean something like
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
you can check proposed Solr integration
https://issues.apache.org/jira/browse/SOLR-3076

Regards

On Thu, Aug 2, 2012 at 9:53 PM, Alexandre Rafalovitch wrote:

> You are not going to get nested entries. So, your sample result is not
> possible. Perhaps you just need to flatten your searchable fields into
> individual article entries and then use a separate DB query to get the
> product information back out of the database.
>
> SOLR is not a database, even a NoSQL one. They added more features in
> that direction for 4.0, but the use cases are limited.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Aug 1, 2012 at 2:14 PM, Thomas Gravel 
> wrote:
> > hm ok I think i have to write my example data and the queries I want
> > to make + the response I expect...
> >
> > Data:
> >
> > {
> > "product_id": "xyz76",
> > "product_name": "tank top",
> > "brand": "adidas",
> > "description":"this is the long description of the product",
> > "short_description":"this is the short description of the
> product",
> > "product_image":"/images/tanktop.jpg",
> > "product_image":"/images/tanktop2.jpg",
> > "article_list": [
> > {
> > "article_number": "TR47",
> > "color": "red",
> > "price": 10.99,
> > "size": "XL",
> > "unit": "1 piece",
> > "inStore": true
> > },
> > {
> > "article_number": "TR48",
> > "color": "blue",
> > "price": 15.99,
> > "size": "XL",
> > "unit": "1 piece",
> > "inStore": false
> > }
> > ]
> > }
> >
> > I want to search:
> > - article_number (i.e with inStore = true)
> > - color
> > - description
> > - short_description
> > - product_name
> >
> > Facets:
> > - brand
> > - color
> > - size
> > - price
> >
> > example query-response
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":2,
> > "params":{
> >   "indent":"on",
> >   "start":"0",
> >   "q":"IBProductName:Durch*",
> >   "wt":"json",
> >   "version":"2.2",
> >   "rows":"10"}},
> >   "response":{"numFound":1,"start":0,"docs":[
> >   {
> > "product_id": "xyz76",
> > "product_name": "tank top",
> > "brand": "adidas",
> > "description":"this is the long description of the product",
> > "short_description":"this is the short description of the
> product",
> > "product_image":"/images/tanktop.jpg",
> > "product_image":"/images/tanktop2.jpg",
> > "article_list": [
> > {
> > "color": "red",
> > "price": 10.99,
> > "size": "XL",
> > "unit": "1 piece",
> > "inStore": true
> > },
> > {
> > "color": "blue",
> > "price": 15.99,
> > "size": "XL",
> > "unit": "1 piece",
> > "inStore": false
> > }
> > ]
> >
> > }
> > ]
> >   }}
> >
> >
> > 2012/8/1 Alexandre Rafalovitch :
> >> Sorry, that did not explain the problem, just more info about data
> >> layout. What are you actually trying to get out of SOLR?
> >>
> >> Are you saying you want parent's details repeated in every entry? Are
> >> you saying that you want to be able to find entries and from there,
> >> being able to find specific parent.
> >>
> >> Whatever you do, SOLR will return you a list of flat entries plus some
> >> statistics on occurrences and facets. Given that, what would you like
> >> to see?
> >>
> >> Regards,
> >>Alex.
> >>
> >> Personal blog: http://blog.outerthoughts.com/
> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >> - Time is the quality of nature that keeps events from happening all
> >> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> >> book)
> >>
> >>
> >> On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel 
> wrote:
> >>> Thanks for the answer.
> >>>
> >>> Ich have to explain, where the problem is...
> >>>
> >>> you may have at the shop solutions products and articles.
> >>> The product is the parent of all articles...
> >>>
> >>> in json...
> >>>
> >>> {
> >>>"product_name": "tank top",
> >>>"article_list": [
> >>>   

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Sorry, I didn't answer your other questions about shards being
in-sync. Yes - all are green and happy according to the Cloud admin
panel.

Tim

On Thu, Aug 2, 2012 at 12:16 PM, Timothy Potter  wrote:
> Thanks Mark.
>
> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
>
> Collection batch = ...
> ... build up batch ...
> solrServer.add( batch );
>
> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
> is that I couldn't get it to run in my Hadoop environment. There's
> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
> 4.1.3 but when I run it in my env, I get the following:
>
> Caused by: java.lang.NoSuchMethodError:
> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
> ()V not found
> at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer.(CloudSolrServer.java:70)
> ... 16 more
>
> I spent hours trying to resolve the classpath issue and finally had to
> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
> stage at this point. So it sounds like this could be the cause of my
> problems.
>
> One other thing ... I do have the _version_ field defined in my
> schema.xml but am not setting it on the client side when indexing.
> Should I be doing that?
>
> Cheers,
> Tim
>
>
> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller  wrote:
>>
>> On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:
>>
>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>>> impressed so far ...
>>>
>>> I have a 12-shard index with ~104M docs with each shard having
>>> 1-replica (so 24 Solr servers running)
>>>
>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>>> (*:*) and each time I send the request the value for numFound in the
>>> result is different. It's always close but not exactly the same as I
>>> would expect? Can anyone shed some light on this issue? I also tried a
>>> real query, such as "#olympics lochte" and same thing - different
>>> numFound each time. The first page of actual docs returned is the same
>>> so maybe I should just ignore the numFound issue?
>>>
>>> Note that while experiencing this behavior, I am not adding any docs
>>> to the index and all docs have been committed with waitFlush=true and
>>> waitSearcher=true on the commit. Also, not doing soft commits at this
>>> point. In addition, after having committed all 104M docs, I hit the
>>> optimize button the panel so I have only 1 segment. In other words,
>>> the index is not being updated and has been optimized at this point.
>>
>>
>> How are you adding docs? Eg what client and what method in particular (what 
>> is your line of code that actually adds the doc).
>>
>> You can find the numFound result for each node by passing the param 
>> distrib=false. What does this tell you? Are your replicas in sync with the 
>> leader? What does the count for each shard add up to?
>>
>> I would not ignore the issue - something must be off. It may somehow be user 
>> error, it may be a bug that has been fixed since the alpha, or it may be 
>> something new.
>>
>> Are you sure every shard you are issuing the query *from* is active and live 
>> according to ZooKeeper? Eg when you look at the cloud admin view and look at 
>> the cluster visualization, are all the nodes green?
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Thanks Mark.

I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:

Collection batch = ...
... build up batch ...
solrServer.add( batch );

Basically, I have a custom Pig StoreFunc that sends docs to Solr from
our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
is that I couldn't get it to run in my Hadoop environment. There's
some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
4.1.3 but when I run it in my env, I get the following:

Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
()V not found
at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.(CloudSolrServer.java:70)
... 16 more

I spent hours trying to resolve the classpath issue and finally had to
bail and just used the 3.4 SolrJ client as I'm just at the evaluation
stage at this point. So it sounds like this could be the cause of my
problems.

One other thing ... I do have the _version_ field defined in my
schema.xml but am not setting it on the client side when indexing.
Should I be doing that?

Cheers,
Tim


On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller  wrote:
>
> On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:
>
>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>> impressed so far ...
>>
>> I have a 12-shard index with ~104M docs with each shard having
>> 1-replica (so 24 Solr servers running)
>>
>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>> (*:*) and each time I send the request the value for numFound in the
>> result is different. It's always close but not exactly the same as I
>> would expect? Can anyone shed some light on this issue? I also tried a
>> real query, such as "#olympics lochte" and same thing - different
>> numFound each time. The first page of actual docs returned is the same
>> so maybe I should just ignore the numFound issue?
>>
>> Note that while experiencing this behavior, I am not adding any docs
>> to the index and all docs have been committed with waitFlush=true and
>> waitSearcher=true on the commit. Also, not doing soft commits at this
>> point. In addition, after having committed all 104M docs, I hit the
>> optimize button the panel so I have only 1 segment. In other words,
>> the index is not being updated and has been optimized at this point.
>
>
> How are you adding docs? Eg what client and what method in particular (what 
> is your line of code that actually adds the doc).
>
> You can find the numFound result for each node by passing the param 
> distrib=false. What does this tell you? Are your replicas in sync with the 
> leader? What does the count for each shard add up to?
>
> I would not ignore the issue - something must be off. It may somehow be user 
> error, it may be a bug that has been fixed since the alpha, or it may be 
> something new.
>
> Are you sure every shard you are issuing the query *from* is active and live 
> according to ZooKeeper? Eg when you look at the cloud admin view and look at 
> the cluster visualization, are all the nodes green?
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


RE: Solr upgrade from 1.4 to 3.6

2012-08-02 Thread Manepalli, Kalyan
Chantal,
Thanks for the reply. I will try it out. 

Thanks,
Kalyan Manepalli


-Original Message-
From: Chantal Ackermann [mailto:c.ackerm...@it-agenten.com] 
Sent: Wednesday, August 01, 2012 3:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr upgrade from 1.4 to 3.6

Hi Kalyan,

that is becouse SolrJ uses "javabin" as format which has class version numbers 
in the serialized objects that do not match. Set the format to XML ("wt" 
parameter) and it will work (maybe JSON would, as well).

Chantal
 

Am 31.07.2012 um 20:50 schrieb Manepalli, Kalyan:

> Hi all,
>We are trying to upgrade our solr instance from 1.4 to 3.6. We 
> use SolrJ API to fetch the data from index. We see that SolrJ 3.6 version is 
> not compatible with index generated with 1.4.
> Is this known issue and is there a workaround for this.
> 
> Thanks,
> Kalyan Manepalli
> 



Re: Map Complex Datastructure with Solr

2012-08-02 Thread Alexandre Rafalovitch
You are not going to get nested entries. So, your sample result is not
possible. Perhaps you just need to flatten your searchable fields into
individual article entries and then use a separate DB query to get the
product information back out of the database.

SOLR is not a database, even a NoSQL one. They added more features in
that direction for 4.0, but the use cases are limited.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Aug 1, 2012 at 2:14 PM, Thomas Gravel  wrote:
> hm ok I think i have to write my example data and the queries I want
> to make + the response I expect...
>
> Data:
>
> {
> "product_id": "xyz76",
> "product_name": "tank top",
> "brand": "adidas",
> "description":"this is the long description of the product",
> "short_description":"this is the short description of the product",
> "product_image":"/images/tanktop.jpg",
> "product_image":"/images/tanktop2.jpg",
> "article_list": [
> {
> "article_number": "TR47",
> "color": "red",
> "price": 10.99,
> "size": "XL",
> "unit": "1 piece",
> "inStore": true
> },
> {
> "article_number": "TR48",
> "color": "blue",
> "price": 15.99,
> "size": "XL",
> "unit": "1 piece",
> "inStore": false
> }
> ]
> }
>
> I want to search:
> - article_number (i.e with inStore = true)
> - color
> - description
> - short_description
> - product_name
>
> Facets:
> - brand
> - color
> - size
> - price
>
> example query-response
> {
>   "responseHeader":{
> "status":0,
> "QTime":2,
> "params":{
>   "indent":"on",
>   "start":"0",
>   "q":"IBProductName:Durch*",
>   "wt":"json",
>   "version":"2.2",
>   "rows":"10"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "product_id": "xyz76",
> "product_name": "tank top",
> "brand": "adidas",
> "description":"this is the long description of the product",
> "short_description":"this is the short description of the product",
> "product_image":"/images/tanktop.jpg",
> "product_image":"/images/tanktop2.jpg",
> "article_list": [
> {
> "color": "red",
> "price": 10.99,
> "size": "XL",
> "unit": "1 piece",
> "inStore": true
> },
> {
> "color": "blue",
> "price": 15.99,
> "size": "XL",
> "unit": "1 piece",
> "inStore": false
> }
> ]
>
> }
> ]
>   }}
>
>
> 2012/8/1 Alexandre Rafalovitch :
>> Sorry, that did not explain the problem, just more info about data
>> layout. What are you actually trying to get out of SOLR?
>>
>> Are you saying you want parent's details repeated in every entry? Are
>> you saying that you want to be able to find entries and from there,
>> being able to find specific parent.
>>
>> Whatever you do, SOLR will return you a list of flat entries plus some
>> statistics on occurrences and facets. Given that, what would you like
>> to see?
>>
>> Regards,
>>Alex.
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel  
>> wrote:
>>> Thanks for the answer.
>>>
>>> Ich have to explain, where the problem is...
>>>
>>> you may have at the shop solutions products and articles.
>>> The product is the parent of all articles...
>>>
>>> in json...
>>>
>>> {
>>>"product_name": "tank top",
>>>"article_list": [
>>>  {
>>>   "color": "red",
>>>   "price": 10.99,
>>>   "size": "XL",
>>>   "inStore": true
>>>  },
>>>  {
>>>   "color": "blue",
>>>   "price": 15.99,
>>>   "size": "XL",
>>>   "inStore": false
>>>  }
>>>]
>>> }
>>>
>>> the problem is not the search (i think, because you can use
>>> copyField), but the searchresults...
>>>
>>> I have read the possibility to create own FieldTypes, but I don't know
>>> if this is the answer of my issu

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller

On Aug 2, 2012, at 11:08 AM, Timothy Potter  wrote:

> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
> impressed so far ...
> 
> I have a 12-shard index with ~104M docs with each shard having
> 1-replica (so 24 Solr servers running)
> 
> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
> (*:*) and each time I send the request the value for numFound in the
> result is different. It's always close but not exactly the same as I
> would expect? Can anyone shed some light on this issue? I also tried a
> real query, such as "#olympics lochte" and same thing - different
> numFound each time. The first page of actual docs returned is the same
> so maybe I should just ignore the numFound issue?
> 
> Note that while experiencing this behavior, I am not adding any docs
> to the index and all docs have been committed with waitFlush=true and
> waitSearcher=true on the commit. Also, not doing soft commits at this
> point. In addition, after having committed all 104M docs, I hit the
> optimize button the panel so I have only 1 segment. In other words,
> the index is not being updated and has been optimized at this point.


How are you adding docs? Eg what client and what method in particular (what is 
your line of code that actually adds the doc).

You can find the numFound result for each node by passing the param 
distrib=false. What does this tell you? Are your replicas in sync with the 
leader? What does the count for each shard add up to?

I would not ignore the issue - something must be off. It may somehow be user 
error, it may be a bug that has been fixed since the alpha, or it may be 
something new.

Are you sure every shard you are issuing the query *from* is active and live 
according to ZooKeeper? Eg when you look at the cloud admin view and look at 
the cluster visualization, are all the nodes green?

- Mark Miller
lucidimagination.com













Re: Solr admin stops working

2012-08-02 Thread Brendan Grainger
I assume you're backgrounding solr. Maybe you just need

disown %1


Brendan

On Aug 2, 2012, at 1:04 PM, Niall  wrote:

> I've got Solr 3.6 up working with Jetty but the admin page is inaccessible
> and Solr appears to stop working when I terminate my SSH connection to the
> server after running start.jar. Am I missing a trick here: how do I keep it
> running?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-admin-stops-working-tp3998848.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting fields of text_general fieldType

2012-08-02 Thread Anupam Bhattacharya
The approach used to work perfectly.

But recently i realized that it is not working for more than 30 indexed
records.
I am using SOLR 3.5 version.

Is there another approach to SORT a title field in proper alphabetical
order irrespective of Lower case and Upper case.

Regards
Anupam

On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan  wrote:

> > The title sort works in a strange manner because the SOLR
> > server treats
> > title string based on Upper Case or Lower Case String. Thus
> > if we sort in
> > ascending order, first the title with numeric shows up then
> > the titles in
> > alphabetical order which starts with Upper Case & after
> > that the titles
> > starting with Lowercase.
> >
> > The title field is indexed as text_general fieldtype.
> >
> >  > stored="true"/>
>
> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>
> Simply create an additional field named title_sortable with the following
> type
>
>  
>  positionIncrementGap="100">
>   
> 
> 
> 
>   
> 
>
> Populate it via copyField directive :
>
>   
>
> then &sort=title_sortable asc
>
>
>


Solr admin stops working

2012-08-02 Thread Niall
I've got Solr 3.6 up working with Jetty but the admin page is inaccessible
and Solr appears to stop working when I terminate my SSH connection to the
server after running start.jar. Am I missing a trick here: how do I keep it
running?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-admin-stops-working-tp3998848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: weired shards search problem

2012-08-02 Thread Timothy Potter
I've seen this when I wasn't using "string" type for my document ID
(uniqueKey field) ...

On Thu, Aug 2, 2012 at 10:13 AM, Joey  wrote:
> we have two shards -shard1 and shard2 - each shard has two slaves. And there
> is a VIP and LB in front of each set of the slaves.
>
> The shard request return a SolrDocumentList object, (for same
> request)SOMETIMES the "getNumFound" of this object return correct data(say
> 3) but the actual doccument list inside this object is empty(when we try
> to iterate the list we could'nt find any matched doc).
>
> However this problem is not happening when there is only one slave per
> shard?
>
> any one have any idea of what's happening?
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/weired-shards-search-problem-tp3998841.html
> Sent from the Solr - User mailing list archive at Nabble.com.


weired shards search problem

2012-08-02 Thread Joey
we have two shards -shard1 and shard2 - each shard has two slaves. And there
is a VIP and LB in front of each set of the slaves. 

The shard request return a SolrDocumentList object, (for same
request)SOMETIMES the "getNumFound" of this object return correct data(say
3) but the actual doccument list inside this object is empty(when we try
to iterate the list we could'nt find any matched doc). 

However this problem is not happening when there is only one slave per
shard?

any one have any idea of what's happening?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/weired-shards-search-problem-tp3998841.html
Sent from the Solr - User mailing list archive at Nabble.com.


what's the cummunication protocol between the shards for a shard reqeust?

2012-08-02 Thread Joey
for example we have two shards: shard1 and shard2. our shard request always
goes to shard1, wondering what's the protocol when shard sends request to
shard2?

is it http. in binary format?
we are trying to set up appdynamics to monitor the shards but looks like
appdynamic could not instrument the request.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-cummunication-protocol-between-the-shards-for-a-shard-reqeust-tp3998839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: growth estimates

2012-08-02 Thread Jeff Minelli
Awesome, thanks!

-jeff

On Aug 2, 2012, at 11:32 AM, Rafał Kuć wrote:

> Hello!
> 
> Take a look at
> http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
> 
> It should be handy.
> 
> -- 
> Regards,
> Rafał Kuć
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
> 
>> Is there a tool or method to help me calculate the growth of solr
>> disk usage based on the known size of data input into it?
> 
>> Thanks,
> 
>> -jeff
> 



Re: growth estimates

2012-08-02 Thread Rafał Kuć
Hello!

Take a look at
http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls

It should be handy.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Is there a tool or method to help me calculate the growth of solr
> disk usage based on the known size of data input into it?

> Thanks,

> -jeff



growth estimates

2012-08-02 Thread Jeff Minelli
Is there a tool or method to help me calculate the growth of solr disk usage 
based on the known size of data input into it?

Thanks,

-jeff

SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
impressed so far ...

I have a 12-shard index with ~104M docs with each shard having
1-replica (so 24 Solr servers running)

Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
(*:*) and each time I send the request the value for numFound in the
result is different. It's always close but not exactly the same as I
would expect? Can anyone shed some light on this issue? I also tried a
real query, such as "#olympics lochte" and same thing - different
numFound each time. The first page of actual docs returned is the same
so maybe I should just ignore the numFound issue?

Note that while experiencing this behavior, I am not adding any docs
to the index and all docs have been committed with waitFlush=true and
waitSearcher=true on the commit. Also, not doing soft commits at this
point. In addition, after having committed all 104M docs, I hit the
optimize button the panel so I have only 1 segment. In other words,
the index is not being updated and has been optimized at this point.


Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread Tanguy Moal
If think you could use a field without the term frequencies for searching,
that will solve your relevancy issues.
You can then have the exact same content in an other field (using a
copyField directive in your schema), having terms frequencies and positions
turned on, and use this particuliar for highlighting.
Searching and highlighting can be totally separated from my understanding.
You could even use an alternate query using the hl.q parameter so that you
can highlight terms that were not searched for, or have terms searched but
not highlighted.

Hope this helps,

--
Tanguy

2012/8/2 abhayd 

> So we have some content where document title is like this
>
> "Accessory for iphone, iphone4, iphone 4s"
>
> So these one come on top results for iphone. This could be content
> authoring
> issue. But we are looking into avoiding such content to come on top.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590p3998820.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr 4.0 - Join performance

2012-08-02 Thread Eric Khoury






Hello all,

 

I’m testing out the new join feature, hitting some perf
issues, as described in Erick’s article 
(http://architects.dzone.com/articles/solr-experimenting-join).

Basically, I’m using 2 objects in solr (this is a simplified
view):

 

Item

- Id

- Name

 

Grant

- ItemId

- AvailabilityStartTime

- AvailabilityEndTime

 

Each item can have multiple grants attached to it.

 

The query I'm using is the following, to find items by
name, filtered by grants availability window:

 

solr/select?fq=Name:XXX&q={!join
from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
TO NOW]

 

With a hundred thousand items, this query can take multiple seconds
to perform, due to the large number or ItemIds returned from the join query.

Has anyone come up with a better way to use joins for these types of queries?  
Are there improvements planned in 4.0 rtm in this area?

 

Btw, I’ve explored simply adding Start-End times to items, but
the flat data model makes it hard to maintain start-end pairs.

 

Thanks for the help!

Eric.

 

  

Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread abhayd
So we have some content where document title is like this

"Accessory for iphone, iphone4, iphone 4s"

So these one come on top results for iphone. This could be content authoring
issue. But we are looking into avoiding such content to come on top.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590p3998820.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Special suggestions requirement

2012-08-02 Thread Michael Della Bitta
In this case, we're storing the overall value length and sorting it on
that, then alphabetically.

Also, how are your queries fashioned? If you're doing a prefix query,
everything that matches it should score the same. If you're only doing
a prefix query, you might need to add a term for exact matches as well
to get them to show up.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander
 wrote:
> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>
> I am trying to suggest part numbers and I'm currently trying to do it with 
> the spellchecker component.
> Let's say "ABCD" was entered and we have indexed part numbers like
> ABCD
> ABCD2000
> ABCD2100
> ABCD2200
> ...
>
> I would like to have 2 characters suggested always, so for "ABCD", it should 
> suggest
> ABCD00
> ABCD20
> ABCD21
> ABCD22
> ...
>
> No smart sorting is needed, just alphabetically sorting. The problem is that 
> for example 00 (or ABCD00) may not be suggested currently as it doesn't score 
> high enough. But we are really trying to get all distinct values starting 
> from the smallest (up to a certain number of suggestions).
>
> I was looking already at custom comparator class option. But this would 
> probably not work as I would need more information to implement it there 
> (like at least the currently entered search term, "ABCD" in the example).
>
> Thanks,
> Alexander


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Robert Muir
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills  wrote:
> Hi everyone,
>
> Is there any chance to get his backported for a 3.6.2 ?
>

Hello, I personally have no problem with it: but its really
technically not a bugfix, just an optimization.

It also doesnt solve the actual problem if you have a tomcat
threadpool configuration recycling threads too fast. There will be
other performance problems.

-- 
lucidimagination.com


Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread Erick Erickson
what do you expect to gain by turning off TF?

This feels a bit like an XY problem

Best
Erick

On Wed, Aug 1, 2012 at 8:43 AM, abhayd  wrote:
> hi
> We would like to turn off TF for a field but we still want to use fast
> vector highlighter.
>
> How would we do that?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: BitSet field type in solr

2012-08-02 Thread Erick Erickson
There has been talk of a bit fieldType, but I don't think it has
ever been implemented. I don't think you need a custom
FieldType, there already is a BinaryType (although I confess
I haven't used it, so check first ).

>From there, I think your custom Similarity is the way to go...

Best
Erick

On Tue, Jul 31, 2012 at 1:10 PM, Dazhi Jiao  wrote:
> Hi,
>
> I am new to solr and I apologize if I am not using the right terms in
> my question. I want to use solr to store and search for some chemistry
> data. For each molecule (represented as a string), I can convert it
> into a bitset representation (fixed length). To search for the
> molecule, the similarity is based on bits comparison. Since there will
> be large amount of data, I want to make the search faster so I imagine
> I need to index these bitsets. I want to write a custom FieldType, a
> custom Similarity. The FieldType will automatically convert the text
> representation of the molecule into a BitSet and store/index it. So my
> question is, is this possible, or is there anything that already in
> solr that can store/index bitsets?
>
> Thank you!
>
> Dazhi


Re: split on white space and then EdgeNGramFilterFactory

2012-08-02 Thread Jack Krupansky
Only do the ngram filter at index time. So, add a query-time analyzer to 
that field type but without the ngram filter.


Also, add &debugQuery to your query request to see what Lucene query is 
generated.


And, use the Solr admin analyzer to validate both index-time and query-time 
analysis of your terms.


-- Jack Krupansky

-Original Message- 
From: Rajani Maski

Sent: Thursday, August 02, 2012 7:26 AM
To: solr-user@lucene.apache.org
Subject: split on white space and then EdgeNGramFilterFactory

Hi,

  I wanted to do split on white space and then apply
EdgeNGramFilterFactory.

Example : A field in a document has text content : "smart phone, i24
xpress exchange offer, 500 dollars"

smart s sm sma smar smart
phone p ph pho phon phone
i24  i i2 i24
xpress x xp xpr xpre xpres xpress

so on.

If I search on  "xpres"  I should get this document record matched

What field type can support this?

I was trying with below one but was not able to achieve the above
requirement.









Any suggestions?

Thanks,
Rajani 



split on white space and then EdgeNGramFilterFactory

2012-08-02 Thread Rajani Maski
Hi,

   I wanted to do split on white space and then apply
EdgeNGramFilterFactory.

Example : A field in a document has text content : "smart phone, i24
xpress exchange offer, 500 dollars"

smart s sm sma smar smart
phone p ph pho phon phone
i24  i i2 i24
xpress x xp xpr xpre xpres xpress

so on.

If I search on  "xpres"  I should get this document record matched

What field type can support this?

I was trying with below one but was not able to achieve the above
requirement.









Any suggestions?

Thanks,
Rajani


Re: matching with whole field

2012-08-02 Thread elisabeth benoit
Thanks you so much Franck Brisbart.

It's working!

Best regards,
Elisabeth

2012/8/2 fbrisbart 

> It's a parsing problem.
> You must tell the query parser to consider spaces as real characters.
> This should work (backslashing the spaces):
> fq=ONLY_EXACT_MATCH_FIELD:salon\ de\ coiffure
>
> or you may use something like that :
> fq={!term f=ONLY_EXACT_MATCH_FIELD v=$qq}&qq=salon de coiffure
>
>
> Hope it helps,
> Franck Brisbart
>
>
> Le jeudi 02 août 2012 à 09:56 +0200, elisabeth benoit a écrit :
> > Hello,
> >
> > I am using Solr 3.4.
> >
> > I'm trying to define a type that it is possible to match with only if
> > request contains exactly the same words.
> >
> > Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> >
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> >
> > I would like to match only with the first ont when requesting Solr with
> > fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> >
> > As far has I understood, the solution is to do not tokenize on white
> > spaces, and use instead solr.KeywordTokenizerFactory
> >
> >
> > My actual type is defined as followed in schema.xml
> >
> >  > omitNorms="true" positionIncrementGap="100">
> >   
> > 
> >  > mapping="mapping-ISOLatin1Accent.txt"/>
> > 
> > 
> > 
> > 
> >   
> > 
> >
> > But matching with fields with more then one word doesn't work. Does
> someone
> > have a clue what I am doing wrong?
> >
> > Thanks,
> > Elisabeth
>
>
>


AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-02 Thread Markus Klose
If you want to search in the two fields "title" and "empname" you have to use 
the query parser (e)dismax
http://wiki.apache.org/solr/ExtendedDisMax
you need to specify the qf param: qf=title empname

check your solrconfig.xml to verifiy which queryparser you are using right now.


In your usecase you do not need the asterics in the query.
"q=am" in combination with the edgengram will find aman, amar, amal

Viele Grüße aus Augsburg

Markus Klose
SHI Elektronische Medien GmbH 
 




-Ursprüngliche Nachricht-
Von: aniljayanti [mailto:anil.jaya...@gmail.com] 
Gesendet: Donnerstag, 2. August 2012 09:34
An: solr-user@lucene.apache.org
Betreff: Re: AW: auto completion search with solr using NGrams in SOLR

Hi,

thanks,

im searching with empname filed.  want to search with both "empname" and 
"title". 

below is my changed code.












 

 

SOLR Query :
http://localhost:8080/AC/select/?q=(*am*)&rows=500

1) want to search with "title" and "empname" both. 
2) am i quering correctly with above url ??

but getting result like .

p*am*
s*am*
ra*am*a

but i want the result like 

aman
amar
amal

please help me in this...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p3998721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 3.4 GeoSpatial Query Returning distance

2012-08-02 Thread Tanguy Moal
Hi,
I've not tested it by myself but I think that can take advantage of Solr
4's pseudo fields, by adding something like :

&fl=*,geodist(),score

I think you could even pass several geodist() calls with different
parameters if you want to have the distance wrt several POIs ^-^

SOLR 4 only.

--
Tanguy

2012/8/2 Michael Kuhlmann 

> On 02.08.2012 01:52, Anand Henry wrote:
>
>> Hi,
>>
>> In SOLR 3.4, while doing a geo-spatial search, is there a way to retrieve
>> the distance of each document from the specified location?
>>
>
> Not that I know of.
>
> What we did was to read and parse the location field on client side and
> calculate the distance on our own using this library:
>
> http://code.google.com/p/**simplelatlng/
>
> However, it's not as "nice" as getting the distance from Solr, and
> sometimes the distances seem to slightly differ - e.g. when you filter up
> to a distance of 100 km, there are cases where the client library still
> computes 100.8 km or so.
>
> But at least, it's working.
>
> -Kuli
>


Re: Solr TermsComponent: space in term

2012-08-02 Thread aniljayanti
Hi 

Im working on autocompelte functionality in solr. can u suggest me the
required configurations in schema.xml and solrconfig.xml for doing
autocomplete in solr ??

thanks in advance,

Anil




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p3998755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching with whole field

2012-08-02 Thread fbrisbart
It's a parsing problem.
You must tell the query parser to consider spaces as real characters.
This should work (backslashing the spaces):
fq=ONLY_EXACT_MATCH_FIELD:salon\ de\ coiffure

or you may use something like that :
fq={!term f=ONLY_EXACT_MATCH_FIELD v=$qq}&qq=salon de coiffure


Hope it helps,
Franck Brisbart


Le jeudi 02 août 2012 à 09:56 +0200, elisabeth benoit a écrit :
> Hello,
> 
> I am using Solr 3.4.
> 
> I'm trying to define a type that it is possible to match with only if
> request contains exactly the same words.
> 
> Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> 
> ONLY_EXACT_MATCH_FIELD: salon de coiffure
> ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> 
> I would like to match only with the first ont when requesting Solr with
> fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> 
> As far has I understood, the solution is to do not tokenize on white
> spaces, and use instead solr.KeywordTokenizerFactory
> 
> 
> My actual type is defined as followed in schema.xml
> 
>  omitNorms="true" positionIncrementGap="100">
>   
> 
>  mapping="mapping-ISOLatin1Accent.txt"/>
> 
> 
> 
> 
>   
> 
> 
> But matching with fields with more then one word doesn't work. Does someone
> have a clue what I am doing wrong?
> 
> Thanks,
> Elisabeth




Re: matching with whole field

2012-08-02 Thread elisabeth benoit
Hello Chantal,

Thanks for your answer.

In fact, my analyzer contains the same tokenizer chain for "query". I just
removed it in my email for lisibility (but maybe not good for clarity). And
I did check with the admin interface, and it says there is a match. But
with a real query to Solr, it doesn't match.

I've once read in the mailing list that one should not always trust the
admin interface for analysis...

I don't think this should interfer, but my default request handler (the one
used by fq I guess) is not edismax.


If you have more clues, I'd be glad to read.

Thanks again,
Elisabeth



2012/8/2 Chantal Ackermann 

> Hi Elisabeth,
>
> try adding the same tokenizer chain for "query", as well, or simply remove
> the type="index" from the analyzer element.
>
> Your chain is analyzing the input of the indexer and removing diacritics
> and lowercasing. With your current setup, the input to the search is not
> analyzed likewise so inputs that are not lowercased or contain diacritics
> will not match.
>
> You might want to use the analysis frontend in the Admin UI to see how
> input to the indexer and the searcher is transformed and matched.
>
> Cheers,
> Chantal
>
> Am 02.08.2012 um 09:56 schrieb elisabeth benoit:
>
> > Hello,
> >
> > I am using Solr 3.4.
> >
> > I'm trying to define a type that it is possible to match with only if
> > request contains exactly the same words.
> >
> > Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> >
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> >
> > I would like to match only with the first ont when requesting Solr with
> > fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> >
> > As far has I understood, the solution is to do not tokenize on white
> > spaces, and use instead solr.KeywordTokenizerFactory
> >
> >
> > My actual type is defined as followed in schema.xml
> >
> > > omitNorms="true" positionIncrementGap="100">
> >  
> >
> > > mapping="mapping-ISOLatin1Accent.txt"/>
> >
> >
> >
> >
> >  
> >
> >
> > But matching with fields with more then one word doesn't work. Does
> someone
> > have a clue what I am doing wrong?
> >
> > Thanks,
> > Elisabeth
>
>


Re: matching with whole field

2012-08-02 Thread Chantal Ackermann
Hi Elisabeth,

try adding the same tokenizer chain for "query", as well, or simply remove the 
type="index" from the analyzer element.

Your chain is analyzing the input of the indexer and removing diacritics and 
lowercasing. With your current setup, the input to the search is not analyzed 
likewise so inputs that are not lowercased or contain diacritics will not match.

You might want to use the analysis frontend in the Admin UI to see how input to 
the indexer and the searcher is transformed and matched.

Cheers,
Chantal

Am 02.08.2012 um 09:56 schrieb elisabeth benoit:

> Hello,
> 
> I am using Solr 3.4.
> 
> I'm trying to define a type that it is possible to match with only if
> request contains exactly the same words.
> 
> Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> 
> ONLY_EXACT_MATCH_FIELD: salon de coiffure
> ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> 
> I would like to match only with the first ont when requesting Solr with
> fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> 
> As far has I understood, the solution is to do not tokenize on white
> spaces, and use instead solr.KeywordTokenizerFactory
> 
> 
> My actual type is defined as followed in schema.xml
> 
> omitNorms="true" positionIncrementGap="100">
>  
>
> mapping="mapping-ISOLatin1Accent.txt"/>
>
>
>
>
>  
>
> 
> But matching with fields with more then one word doesn't work. Does someone
> have a clue what I am doing wrong?
> 
> Thanks,
> Elisabeth



matching with whole field

2012-08-02 Thread elisabeth benoit
Hello,

I am using Solr 3.4.

I'm trying to define a type that it is possible to match with only if
request contains exactly the same words.

Let's say I have two different values for ONLY_EXACT_MATCH_FIELD

ONLY_EXACT_MATCH_FIELD: salon de coiffure
ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes

I would like to match only with the first ont when requesting Solr with
fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)

As far has I understood, the solution is to do not tokenize on white
spaces, and use instead solr.KeywordTokenizerFactory


My actual type is defined as followed in schema.xml


  






  


But matching with fields with more then one word doesn't work. Does someone
have a clue what I am doing wrong?

Thanks,
Elisabeth


Re: AW: auto completion search with solr using NGrams in SOLR

2012-08-02 Thread aniljayanti
Hi,

thanks,

im searching with empname filed.  want to search with both "empname" and
"title". 

below is my changed code.


















SOLR Query :
http://localhost:8080/AC/select/?q=(*am*)&rows=500

1) want to search with "title" and "empname" both. 
2) am i quering correctly with above url ??

but getting result like .

p*am*
s*am*
ra*am*a

but i want the result like 

aman
amar
amal

please help me in this...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p3998721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Laurent Vaills
Hi everyone,

Is there any chance to get his backported for a 3.6.2 ?

Regards,
Laurent

2012/8/2 Simon Willnauer 

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev  wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
> > Thanks again
> > Saroj
> >
> >
> >
> >
> >
> > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir  wrote:
> >
> >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev  wrote:
> >> > Hi All
> >> >
> >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> >> that
> >> > when we are indexing lots of data with 16 concurrent threads, Heap
> grows
> >> > continuously. It remains high and ultimately most of the stuff ends up
> >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> >> > getting into excessive GC problem.
> >>
> >> Hi: I don't claim to know anything about how tomcat manages threads,
> >> but really you shouldnt have all these objects.
> >>
> >> In general snowball stemmers should be reused per-thread-per-field.
> >> But if you have a lot of fields*threads, especially if there really is
> >> high thread churn on tomcat, then this could be bad with snowball:
> >> see eks dev's comment on
> https://issues.apache.org/jira/browse/LUCENE-3841
> >>
> >> I think it would be useful to see if you can tune tomcat's threadpool
> >> as he describes.
> >>
> >> separately: Snowball stemmers are currently really ram-expensive for
> >> stupid reasons.
> >> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> >> is about 8KB.
> >>
> >> I'll regenerate these and open a JIRA issue: as the snowball code
> >> generator in their svn was improved
> >> recently and each one now takes about 64 bytes instead (the Among's
> >> are static and reused).
> >>
> >> Still this wont really "solve your problem", because the analysis
> >> chain could have other heavy parts
> >> in initialization, but it seems good to fix.
> >>
> >> As a workaround until then you can also just use the "good old
> >> PorterStemmer" (PorterStemFilterFactory in solr).
> >> Its not exactly the same as using Snowball(English) but its pretty
> >> close and also much faster.
> >>
> >> --
> >> lucidimagination.com
> >>
>


Re: SOLR 3.4 GeoSpatial Query Returning distance

2012-08-02 Thread Michael Kuhlmann

On 02.08.2012 01:52, Anand Henry wrote:

Hi,

In SOLR 3.4, while doing a geo-spatial search, is there a way to retrieve
the distance of each document from the specified location?


Not that I know of.

What we did was to read and parse the location field on client side and 
calculate the distance on our own using this library:


http://code.google.com/p/simplelatlng/

However, it's not as "nice" as getting the distance from Solr, and 
sometimes the distances seem to slightly differ - e.g. when you filter 
up to a distance of 100 km, there are cases where the client library 
still computes 100.8 km or so.


But at least, it's working.

-Kuli