solr.WordDelimiterFilterFactory query time

2012-04-29 Thread abhayd
hi 

I am using solr.WordDelimiterFilterFactory for a text_en field during query
time.

my title for document is: blackberry torch 9810
My query : torch9810 works fine
It splits alpha numeric and gets me the document.

But when query is:blackberry9810 it splits to blackberry 9810 but I dont get
the document I mentioned above.
If i change query to blackberry 9810 (two words) i get the document. 

Can anyone explain what I m doing wrong? When i query blackberry9810 i would
like to get the same results as blackberry 9810

thanks
abhay 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Customized sorting in Solr

2012-04-29 Thread solr user
Hi,

Any suggestions,

Am I trying to do too much with solr? Is there any other search engine,
which should be used here?

I am looking into solr codebase and planning to modify QueryComponent. Will
this be the right approach?

Regards,

Shivam

On Fri, Apr 27, 2012 at 10:48 AM, solr user  wrote:

> Jan,
>
> Thanks for the response,
>
> I though of using it, but it will be suboptimal to do this in the scenario
> I have. I guess I have to explain the scenario better, let me try it again:-
>
> 1. I have importance based buckets in the system, this is implemented
> using a variable named bucket_count having integer values 0,1,2,3, and I
> have to show results in order of bucket_count i.e. results from 0th bucket
> at top, then results from 1st bucket and so on. That is done by doing a asc
> sort on this variable.
> 2. Now *within these buckets* I need to ensure that 1st listing of every
> advertiser comes at top, then 2nd listing from every advertiser and so on.
>
> Now if I go with the grouping on advertiserId and and use the
> group.offset, then probably I also need to do additive filtering on
> bucket_count. To explain it better pseudo algorithm will be like
>
> 1. query solr with group.offset 0 and bucket count 0
> 2. if results more than zero in step1 then increase group offset and
> follow step 1 again
> 3. else increase bucket count with group offset zero and start from step 1.
>
> With this logic in the worst case I need to query solr (number of
> importance buckets)*(max number of listings by an advertiser). Which could
> be very high number of solr queries for a single user query. Please suggest
> if I can do this with more optimal way. I am also open to do modifications
> in solr/lucene code if needed.
>
> Regards,
> BC Rathore
>
>
>
> On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl wrote:
>
>> Hi,
>>
>> How about trying grouping with paging?
>> First you do
>> group=true&group.field=advertiserId&group.limit=1&group.offset=0&group.main=true&sort=something&group.sort=how-much-paid
>> desc
>>
>> That gives you one listing per advertiser, sorted the way you like.
>> Then to grab the next batch of ads, you go group.offset=1 etc etc.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> On 26. apr. 2012, at 08:10, solr user wrote:
>>
>> > Hi,
>> >
>> > We are planning to move the search of one of our listing based portal to
>> > solr/lucene search server from sphinx search server. But we are facing a
>> > challenge is porting customized sorting being used in our portal. We
>> only
>> > have last 60 days of data live.The algorithm is as follows:-
>> >
>> >   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
>> >   buckets of 7day, 1 day, 1 day……
>> >   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
>> >   3.  For each paid / free bucket cycle the advertisers on uniqueness
>> basis
>> >
>> >  i.e. inside a bucket the ordering should be 1st listing
>> > of each advertiser, 2nd listing of each advertiser and so on
>> >  in other words within a *sub-bucket* second listing of
>> an
>> > advertiser will be displayed only after first listing of all advertiser
>> has
>> > been displayed.
>> >
>> > For taking care of point 1 and 2 we have created a field named
>> bucket_index
>> > at the time of indexing the data and get the results sorted by this
>> index,
>> > but we are not able to find a way to create a sort field at index time
>> or
>> > think of a sort function for the point no 3.  Please suggest if there
>> is a
>> > way to do so in solr.
>> >
>> > Tia,
>> >
>> > BC Rathore
>>
>>
>


Re: Scaling Solr - Suggestions !!

2012-04-29 Thread Sujatha Arun
Now the reason ,I have used different webapps instead of a single one for
the cores is ,while prototyping ,I discovered that ,when one of the cores
index is corrupt ,the entire webapp does not start up and the same must be
true of  "too many open files" etc ,that is to say if there is an issue
withe any one core [Schema /index] ,the entire webapp does not start up.

Thanks for your suggestion.


Regards
Sujatha







On Sat, Apr 28, 2012 at 6:49 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Just my opinion, but I'm not sure I see the value in deploying the cores
> to different webapps in a single container on a single machine to avoid
> a single point of failure... You still have a single point of failure at
> the process level down to the hardware, which when you think about it,
> is mostly everything. But perhaps you're at least using more than one
> container.
>
> It sounds to me that the easiest route to scalability for you would be
> to add more machines. Unless your cores are particularly complex or your
> traffic is heavy, a 3GB core should be no match for a single machine.
> And the traffic problem can be solved by replication and load balancing.
>
> Michael
>
> On Sat, 2012-04-28 at 13:24 +0530, Sujatha Arun wrote:
> > Hello,
> >
> > *Background* :For each of our  customers, we create 3 solr webapps with
> > different search  schema's,serving different search requirements and we
> > have about 70 customers.So we have about 210 webapps curently .
> >
> > *Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram :32GB ,
> > Permgen initally 1GB ,now increased to 2GB.
> >
> > *Solr Indexes* : Most are the order of a few MB ,about 2  big index of
> > about 3GB  each
> >
> > *Scaling Step 1 *:  We saw the permgen value go upto to nearly 850 mb
> ,when
> > we created so  many webapps ,hence now we are moving to solr cores and we
> > are going to have about 50 cores per webapp ,bringing the number of
> webapps
> > to about 5 . We want to distribute the cores with multiple webapps to
> avoid
> > a single point of failure.
> >
> >
> > *Requirement* :
> >
> >
> >-   We need to only scale the cores horizontally ,whose index sizes
> are
> >big.
> >-   We also require permission based search for each webapp ,would
> solr
> >NRT fit our needs ,where we can index the permission into the document
> >,which would mean   there would be frequent addition and deletion of
> >permissions to the documents across cores.
> >-   We also require  automatic fail over
> >
> > What technology would be ideal fit given Solr Cloud ,Katta , Solandra
> > ,Lily,Elastic Search etc [Preferably Open source] [ We would be required
> to
> > maintain many webapps with multicores ] and what about the commercial
> > offering given out use case
> >
> > Thanks.
> >
> > Regards,
> > Sujatha
>
>
>


ConcurrentUpdateSolrServer and unable to override default http settings

2012-04-29 Thread Gopal Patwa
In Solr4j client trunk build for 4.0, ConcurrentUpdateSolrServer class does
not allow to override default http settings
like HttpConnectionParams.setConnectionTimeout,
HttpConnectionParams.setSoTimeout, DefaultMaxConnectionsPerHost


Due to HttpSolrServer is not accessible from ConcurrentUpdateSolrServer
class,  since most of time you just need to override default http settings,
I know we can pass HttpClient but it would be nice
if ConcurrentUpdateSolrServer can allow to get access to HttpSolrServer
from some getter method.

Otherwise anyone who need to override default http settings need to pass
HttpClient.


-Gopal Patwa


Re: Question on Facet counts by grouped results

2012-04-29 Thread Sohail Aboobaker
I had copied the full example directory. After copying, I had replaced
schema.xml from my old 3.5 schema.xml and I faced this error. After Eric's
email, I copied stopwords_en.txt into conf directory of my copy. It works
fine after that.

After seeing your email, it seems that the better approach would be to add
lang/ infront of the stopwords_en references in my copied schema.xml
because lang/ did not exist in the 3.5 directories.

Regards
Sohail


Re: change index/store at indexing time

2012-04-29 Thread Vazquez, Maria (STM)
Thanks for your response.
That's what I need, changes at indexing time. Dynamic fields are not what I 
need because the field name is the same, I just need to change if they 
indexed/stored based on some logic.
It's so easily achieved with the Lucene API, I was sure there was a way to do 
the same in Solr.


On Apr 28, 2012, at 22:34, "Jeevanandam"  wrote:

> Maria,
> 
> thanks for detailed explanation.
> as per schema.xml; stored or indexed should be defined at design-time. Per my 
> understanding defining at runtime is not feasible.
> BTW, you can have multiValued="true" attribute for dynamic fields too.
> 
> - Jeevanandam
> 
> On 29-04-2012 2:06 am, Vazquez, Maria (STM) wrote:
>> Thanks Jeevanandam.
>> That still doesn't have the same behavior as Lucene since multiple
>> fields with different names have to be created.
>> What I want is this exactly (multi-value field)
>> 
>> document.add(new Field("geoids", geoId, Field.Store.YES,
>> Field.Index.NOT_ANALYZED_NO_NORMS));
>> 
>> document.add(new Field("geoids", geoId, Field.Store.NO,
>> Field.Index.NOT_ANALYZED_NO_NORMS));
>> 
>> In Lucene I can save geoids first as stored and in the next line as
>> not stored and it will do exactly that. I want to duplicate this
>> behavior in Solr but I can't do it having only one field in the schema
>> called geoids that I an manipulate at inde time whether to store or
>> not depending on a condition.
>> 
>> Thanks again for the help, hope this explanation makes it more clear
>> in what I'm trying to do.
>> 
>> Maria
>> 
>> On Apr 28, 2012, at 11:49 AM, "Jeevanandam"
>> mailto:je...@myjeeva.com>> wrote:
>> 
>> Maria,
>> 
>> For your need please define unique pattern using dynamic field in schema.xml
>> 
>> Please have a look http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
>> 
>> Hope that helps!
>> 
>> -Jeevanandam
>> 
>> Technology keeps you connected!
>> 
>> On Apr 28, 2012, at 10:33 PM, "Vazquez, Maria (STM)"
>> mailto:maria.vazq...@dexone.com>> wrote:
>> 
>> I can call a script for the logic part but what I want to figure out
>> is how to save the same field sometimes as stored and indexed,
>> sometimes as stored not indexed, etc. From a transformer or a script I
>> didn't see anything where I can modify that at indexing time.
>> Thanks a lot,
>> Maria
>> 
>> 
>> On Apr 27, 2012, at 18:38, "Bill Bell"
>> mailto:billnb...@gmail.com>> wrote:
>> 
>> Yes you can. Just use a script that is called for each row.
>> 
>> Bill Bell
>> Sent from mobile
>> 
>> 
>> On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)"
>> mailto:maria.vazq...@dexone.com>> wrote:
>> 
>> Hi,
>> I'm migrating a project from Lucene 2.9 to Solr 3.4.
>> There is a special case in the code that indexes the same field in
>> two different ways, which is completely legal in Lucene directly but I
>> don't know how to duplicate this same behavior in Solr:
>> 
>> if (isFirstGeo) {
>> document.add(new Field("geoids", geoId, Field.Store.YES,
>> Field.Index.NOT_ANALYZED_NO_NORMS));
>> isFirstGeo = false;
>> } else {
>> if (countProducts < 100)
>>  document.add(new Field("geoids", geoId, Field.Store.NO,
>> Field.Index.NOT_ANALYZED_NO_NORMS));
>> else
>>  document.add(new Field("geoids", geoId, Field.Store.YES,
>> Field.Index.NO));
>> }
>> 
>> Is there any way to do this in Solr in a Tranformer? I'm using the
>> DIH to index and I can't see a way to do this other than having three
>> fields in the schema like geoids_store_index, geoids_nostore_index,
>> and geoids_store_noindex.
>> 
>> Thanks a lot in advance.
>> Maria
> 


RE: Unsubscribe does not appear to be working

2012-04-29 Thread Kevin Bootz
Thanks all. Second unsubscribe confirmation email sent to the ezmlm app. 
Perhaps it will take this time...
"

Hi! This is the ezmlm program. I'm managing the solr-user@lucene.apache.org 
mailing list.

I'm working for my owner, who can be reached at 
solr-user-ow...@lucene.apache.org.

To confirm that you would like

   myemail

removed from the solr-user mailing list, please send a short reply to this 
address:

   solr-user-uc.1335712063.gfmnpbjnkpamcooicane-myem...@lucene.apache.org

...
"


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, April 27, 2012 12:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Unsubscribe does not appear to be working


: There is no such thing as a 'solr forum' or a 'solr forum account.'
: 
: If you are subscribed to this list, an email to the unsubscribe
: address will unsubscribe you. If some intermediary or third party is
: forwarding email from this list to you, no one here can help you.

And more specificaly:

* sending email to solr-user-help@lucene will generate an automated reply with 
details about how to unsubscribe and even how to tell what address you are 
subscribed with.

* if the autoamted system isn't working for you, please send all of the 
important details (who you are, what you've tried, what automated responses 
you've gotten) to the solr-user-owner@lucene alias so the moderators can try to 
help you.



-Hoss


Re: change index/store at indexing time

2012-04-29 Thread Jeevanandam Madanagopal
Maria -

thanks for detailed explanation. 
as per schema.xml; stored or indexed should be defined at schema design itself. 
 as per my understanding defining at runtime is not feasible. 
BTW, you can have multiValued="true" attribute for dynamic fields too.

- Jeevanandam

On Apr 29, 2012, at 1:06 AM, Vazquez, Maria (STM) wrote:

> Thanks Jeevanandam.
> That still doesn't have the same behavior as Lucene since multiple fields 
> with different names have to be created.
> What I want is this exactly (multi-value field)
> 
> document.add(new Field("geoids", geoId, Field.Store.YES, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> 
> document.add(new Field("geoids", geoId, Field.Store.NO, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> 
> In Lucene I can save geoids first as stored and in the next line as not 
> stored and it will do exactly that. I want to duplicate this behavior in Solr 
> but I can't do it having only one field in the schema called geoids that I an 
> manipulate at inde time whether to store or not depending on a condition.
> 
> Thanks again for the help, hope this explanation makes it more clear in what 
> I'm trying to do.
> 
> Maria
> 
> On Apr 28, 2012, at 11:49 AM, "Jeevanandam" 
> mailto:je...@myjeeva.com>> wrote:
> 
> Maria,
> 
> For your need please define unique pattern using dynamic field in schema.xml
> 
> Please have a look http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> 
> Hope that helps!
> 
> -Jeevanandam
> 
> Technology keeps you connected!
> 
> On Apr 28, 2012, at 10:33 PM, "Vazquez, Maria (STM)" 
> mailto:maria.vazq...@dexone.com>> wrote:
> 
> I can call a script for the logic part but what I want to figure out is how 
> to save the same field sometimes as stored and indexed, sometimes as stored 
> not indexed, etc. From a transformer or a script I didn't see anything where 
> I can modify that at indexing time.
> Thanks a lot,
> Maria
> 
> 
> On Apr 27, 2012, at 18:38, "Bill Bell" 
> mailto:billnb...@gmail.com>> wrote:
> 
> Yes you can. Just use a script that is called for each row.
> 
> Bill Bell
> Sent from mobile
> 
> 
> On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)" 
> mailto:maria.vazq...@dexone.com>> wrote:
> 
> Hi,
> I'm migrating a project from Lucene 2.9 to Solr 3.4.
> There is a special case in the code that indexes the same field in two 
> different ways, which is completely legal in Lucene directly but I don't know 
> how to duplicate this same behavior in Solr:
> 
> if (isFirstGeo) {
> document.add(new Field("geoids", geoId, Field.Store.YES, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> isFirstGeo = false;
> } else {
> if (countProducts < 100)
>document.add(new Field("geoids", geoId, Field.Store.NO, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> else
>document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NO));
> }
> 
> Is there any way to do this in Solr in a Tranformer? I'm using the DIH to 
> index and I can't see a way to do this other than having three fields in the 
> schema like geoids_store_index, geoids_nostore_index, and 
> geoids_store_noindex.
> 
> Thanks a lot in advance.
> Maria
> 
> 
> 



Re: should slave replication be turned off / on during master clean and re-index?

2012-04-29 Thread Shawn Heisey

On 4/27/2012 8:33 PM, geeky2 wrote:

well, in this case when i say, "clean"  (on the Master), i mean selecting
the "Full Import with Cleaning" button from the DataImportHandler
Development Console page in solr.  at the top of the page, i have the check
boxes selected for verbose and clean (*but i don't have the commit checkbox
selected*).

by doing the above process - doesn't this issue a deletion query - then
start the import?

and as a follow-up - when actually is the commit being done?


here is my from my solrconfig.xml file on the master

   
*
   6
   1000
 *
 10
   


With commit turned off on the import, the *import* will not do a commit 
at any time, so something else has to do the commit or you will never 
see the new index.


In your case, you are relying on autocommit.  Because I don't use 
autocommit, I can't say for sure that the following is right, but I 
believe that it is:  With your settings during a full import, your index 
will go from having everything in it to having 1000 documents or less 
within one minute of the import starting.


If that is indeed what happens (and you should definitely test to make 
sure) and you have replication active, your slaves would have a reduced 
index that would slowly build back up as the import progressed on the 
master.  I am pretty sure that's not what you want, so it is a good idea 
to disable replication until the full import is complete.


There is another option, one that would be a good idea if you make 
additions/deletions to your index on an interval that is smaller than 
the time it takes for a full-import:  Maintain a live core and a build 
core on your master server.  Build a new index in the build core while 
simultaneously keeping the live core up to date.  When the build is 
complete, update it to be current and then swap the live core and build 
core.  If replication is set up correctly, the slaves should replicate 
the new index as soon as the cores are swapped.


Thanks,
Shawn