date:20080930

Re: How to select one entity at a time?

2008-09-30 Thread con

Of course I agree.
But while performing a search, if I want to search only the data from USER
table, how can I acheive it.

Suppose I have a user name bob in both USER and MANAGER tables. So when I
perform http://localhost:8983/solr/dataimport?command=full-import , all the
USER and MANAGER values will get indexed. 
And when i do a search like,
http://localhost:8983/solr/select/?q=bob&version=2.2&start=0&rows=10&indent=on&wt=json
it will return all the values indexed from both USER and MANAGER table.
But I want only the data indexed from either USER table or MANAGER table at
a time based on the end user's choice. How can I achieve it.

Thanks for your reply 
con 

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> The entity and the select query has no relationship
> The entity comes into picture when you do a dataimport
> 
> eg:
> http://localhost:8983/solr/dataimport?command=full-import&enity=user
> 
> This is an indexing operation
> 
> On Wed, Oct 1, 2008 at 11:26 AM, con <[EMAIL PROTECTED]> wrote:
>>
>> Hi guys,
>> In the URL, http://localhost:8983/solr/select/?q=
>> :bob&version=2.2&start=0&rows=10&indent=on&wt=json
>>
>> q=: applies to a field and not to an entity. So If I have 3 entities
>> like:
>>
>> 
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>>
>> I cannot invoke the entity, 'user', just like the above url. i went
>> through
>> the possible arguments but didnt found a way to invoke an entity. Is
>> there a
>> way for this purpose.
>> ragards
>> con
>>
>>
>>
>>
>>
>>
>> con wrote:
>>>
>>> Thanks Everybody.
>>> I have went through the wiki and some other docs. Actually I have a
>>> tight
>>> schedule and I have to look into various other things along with this.
>>> Currently I am looking into rebuilding solr by writing a wrapper class.
>>> I will update you with more meaningful questions soon..
>>> thanks and regards.
>>> con
>>>
>>>
>>> Norberto Meijome-6 wrote:

 On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
 con <[EMAIL PROTECTED]> wrote:

> What you meant is correct only. Please excuse for that I am new to
> solr.
> :-(

 Con, have a read here :

 http://www.ibm.com/developerworks/java/library/j-solr1/

 it helped me pick up the basics a while back. it refers to 1.2, but the
 core concepts are relevant to 1.3 too.

 b
 _
 {Beto|Norberto|Numard} Meijome

 Hildebrant's Principle:
 If you don't know where you are going,
 any road will get you there.

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet. Reading disclaimers makes you go blind. Writing them is worse. You
 have been Warned.

>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19755437.html
Sent from the Solr - User mailing list archive at Nabble.com.

Does Solr Indexing Websites possible?

2008-09-30 Thread RaghavPrabhu


Hi all,

  I want to enable the search functionality in my website. Can i use solr
for indexing the website? Is there any option in solr.Pls let me know as
soon as possible.

Thanks in advance
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Does-Solr-Indexing-Websites-possible--tp19755329p19755329.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to select one entity at a time?

2008-09-30 Thread Noble Paul നോബിള്‍ नोब्ळ्

The entity and the select query has no relationship
The entity comes into picture when you do a dataimport

eg:
http://localhost:8983/solr/dataimport?command=full-import&enity=user

This is an indexing operation

On Wed, Oct 1, 2008 at 11:26 AM, con <[EMAIL PROTECTED]> wrote:
>
> Hi guys,
> In the URL, http://localhost:8983/solr/select/?q=
> :bob&version=2.2&start=0&rows=10&indent=on&wt=json
>
> q=: applies to a field and not to an entity. So If I have 3 entities
> like:
>
> 
>
>
>
>
>
>
>
>
>
>
>
> 
>
> I cannot invoke the entity, 'user', just like the above url. i went through
> the possible arguments but didnt found a way to invoke an entity. Is there a
> way for this purpose.
> ragards
> con
>
>
>
>
>
>
> con wrote:
>>
>> Thanks Everybody.
>> I have went through the wiki and some other docs. Actually I have a tight
>> schedule and I have to look into various other things along with this.
>> Currently I am looking into rebuilding solr by writing a wrapper class.
>> I will update you with more meaningful questions soon..
>> thanks and regards.
>> con
>>
>>
>> Norberto Meijome-6 wrote:
>>>
>>> On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
>>> con <[EMAIL PROTECTED]> wrote:
>>>
 What you meant is correct only. Please excuse for that I am new to solr.
 :-(
>>>
>>> Con, have a read here :
>>>
>>> http://www.ibm.com/developerworks/java/library/j-solr1/
>>>
>>> it helped me pick up the basics a while back. it refers to 1.2, but the
>>> core concepts are relevant to 1.3 too.
>>>
>>> b
>>> _
>>> {Beto|Norberto|Numard} Meijome
>>>
>>> Hildebrant's Principle:
>>> If you don't know where you are going,
>>> any road will get you there.
>>>
>>> I speak for myself, not my employer. Contents may be hot. Slippery when
>>> wet. Reading disclaimers makes you go blind. Writing them is worse. You
>>> have been Warned.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: How to select one entity at a time?

2008-09-30 Thread con


Hi guys,
In the URL, http://localhost:8983/solr/select/?q=
:bob&version=2.2&start=0&rows=10&indent=on&wt=json 

q=: applies to a field and not to an entity. So If I have 3 entities
like:

 
 
 
 
 

 
 

 
 
 
 

I cannot invoke the entity, 'user', just like the above url. i went through
the possible arguments but didnt found a way to invoke an entity. Is there a
way for this purpose.
ragards
con






con wrote:
> 
> Thanks Everybody.
> I have went through the wiki and some other docs. Actually I have a tight
> schedule and I have to look into various other things along with this.
> Currently I am looking into rebuilding solr by writing a wrapper class.
> I will update you with more meaningful questions soon.. 
> thanks and regards.
> con
> 
> 
> Norberto Meijome-6 wrote:
>> 
>> On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
>> con <[EMAIL PROTECTED]> wrote:
>> 
>>> What you meant is correct only. Please excuse for that I am new to solr.
>>> :-(
>> 
>> Con, have a read here :
>> 
>> http://www.ibm.com/developerworks/java/library/j-solr1/
>> 
>> it helped me pick up the basics a while back. it refers to 1.2, but the
>> core concepts are relevant to 1.3 too.
>> 
>> b
>> _
>> {Beto|Norberto|Numard} Meijome
>> 
>> Hildebrant's Principle:
>> If you don't know where you are going,
>> any road will get you there.
>> 
>> I speak for myself, not my employer. Contents may be hot. Slippery when
>> wet. Reading disclaimers makes you go blind. Writing them is worse. You
>> have been Warned.
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread Noble Paul നോബിള്‍ नोब्ळ्

this patch is created from 1.3 (may apply on trunk also)
--Noble

On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> I guess it is a threading problem. I can give you a patch. you can raise a bug
> --Noble
>
> On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote:
>>
>> As a follow up: I continued tweaking the data-config.xml, and have been able
>> to make the commit fail with as little as 3 fields in the sdc.xml, with only
>> one multivalued field. Even more strange, some fields work and some do not.
>> For instance, in my dc.xml:
>>
>> > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon"
>> />
>> .
>> .
>> .
>> > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept"
>> />
>>
>> and in the schema.xml:
>> > multiValued="true" />
>> .
>> .
>> .
>> > multiValued="true" />
>> but taxon works and genpept does not. What could possibly account for this
>> discrepancy? Again, the error logs from the server are exactly that seen in
>> the first post.
>>
>> What is going on?
>>
>>
>> KyleMorrison wrote:
>>>
>>> Yes, this is the most recent version of Solr, stream="true" and stopwords,
>>> lowercase and removeDuplicate being applied to all multivalued fields?
>>> Would the filters possibly be causing this? I will not use them and see
>>> what happens.
>>>
>>> Kyle
>>>
>>>
>>> Shalin Shekhar Mangar wrote:

 Hmm, strange.

 This is Solr 1.3.0, right? Do you have any transformers applied to these
 multi-valued fields? Do you have stream="true" in the entity?

 On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
 wrote:

>
> I apologize for spamming this mailing list with my problems, but I'm at
> my
> wits end. I'll get right to the point.
>
> I have an xml file which is ~1GB which I wish to index. If that is
> successful, I will move to a larger file of closer to 20GB. However,
> when I
> run my data-config(let's call it dc.xml) over it, the import only
> manages
> to
> get about 27 rows, out of roughly 200K. The exact same
> data-config(dc.xml)
> works perfectly on smaller data files of the same type.
>
> This data-config is quite large, maybe 250 fields. When I run a smaller
> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
> perfectly. The only conclusion I can draw from this is that the
> data-config
> method just doesn't scale well.
>
> When the dc.xml fails, the server logs spit out:
>
> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
> status=0
> QTime=95
> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
> deleteAll
> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> java.util.ConcurrentModificationException
>at
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
> status=0
> QTime=77
> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
> deleteAll
> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> java.util.ConcurrentModificationException
>at
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>at java.util.Abstr

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread Noble Paul നോബിള്‍ नोब्ळ्

I guess it is a threading problem. I can give you a patch. you can raise a bug
--Noble

On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote:
>
> As a follow up: I continued tweaking the data-config.xml, and have been able
> to make the commit fail with as little as 3 fields in the sdc.xml, with only
> one multivalued field. Even more strange, some fields work and some do not.
> For instance, in my dc.xml:
>
>  xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon"
> />
> .
> .
> .
>  xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept"
> />
>
> and in the schema.xml:
>  multiValued="true" />
> .
> .
> .
>  multiValued="true" />
> but taxon works and genpept does not. What could possibly account for this
> discrepancy? Again, the error logs from the server are exactly that seen in
> the first post.
>
> What is going on?
>
>
> KyleMorrison wrote:
>>
>> Yes, this is the most recent version of Solr, stream="true" and stopwords,
>> lowercase and removeDuplicate being applied to all multivalued fields?
>> Would the filters possibly be causing this? I will not use them and see
>> what happens.
>>
>> Kyle
>>
>>
>> Shalin Shekhar Mangar wrote:
>>>
>>> Hmm, strange.
>>>
>>> This is Solr 1.3.0, right? Do you have any transformers applied to these
>>> multi-valued fields? Do you have stream="true" in the entity?
>>>
>>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
>>> wrote:
>>>

 I apologize for spamming this mailing list with my problems, but I'm at
 my
 wits end. I'll get right to the point.

 I have an xml file which is ~1GB which I wish to index. If that is
 successful, I will move to a larger file of closer to 20GB. However,
 when I
 run my data-config(let's call it dc.xml) over it, the import only
 manages
 to
 get about 27 rows, out of roughly 200K. The exact same
 data-config(dc.xml)
 works perfectly on smaller data files of the same type.

 This data-config is quite large, maybe 250 fields. When I run a smaller
 data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
 perfectly. The only conclusion I can draw from this is that the
 data-config
 method just doesn't scale well.

 When the dc.xml fails, the server logs spit out:

 Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/dataimport params={command=full-import}
 status=0
 QTime=95
 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 INFO: Starting Full Import
 Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
 deleteAll
 INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
 Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 SEVERE: Full Import failed
 java.util.ConcurrentModificationException
at
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at

 org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at

 org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
 Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/dataimport params={command=full-import}
 status=0
 QTime=77
 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 INFO: Starting Full Import
 Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
 deleteAll
 INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
 Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 SEVERE: Full Import failed
 java.util.ConcurrentModificationException
at
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at

 org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at

 org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at

 org.apache.solr.

Re: Question about facet.prefix usage

2008-09-30 Thread Simon Hu

not really. facet.query filters the result set. Here we need to filter the
facet counts by multiple facet prefixes. facet.query would work only if the
faceted field is not a multi-value field. 

Erik Hatcher wrote:
> 
> If I'm not mistaken, doesn't facet.query accomplish what you want?
> 
>   Erik
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Question-about-facet.prefix-usage-tp15836501p19753290.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Are facet searches slower on large indexes?

2008-09-30 Thread Chris Hostetter


the time factor has more to do with teh number of distinct values in the 
field being faceted on then it does the number of documents.  with 1 
million documents there are probably a lot more indexed terms in the 
"contents" field then there are with only 1000 documents.

As an inverted index, there is no efficient way for Solr's faceting code 
to know just which terms are in the 37 docs that match your query -- it 
has to check them all.  The good news is that if you can make your 
filterCache big enough, it won't matter which 37 (or 37,000) documents 
match your next query where you facet on the contents field -- the facet 
counts should compute much faster.

For fields where Solr can tell it will have just one value, it can do some 
optimizations to use the FieldCache instead of iterating over every term 
in the field you're faceting on, but that would apply to your "contents" 
field.

: I'm doing a facet search like the following.  The content field schema is
: 
: 
: 
: 
: 
: 
: 
: 
: /solr/select?q=dirt
: 
field:www.example.com&facet=true&facet.field=content&facet.limit=-1&facet.mincount=1
: 
: If I run this on a server with a total of 1000 pages that contain
: pages for www.example.com, it returns in about 1 second, and gives me
: 37 docs, and quite a few facet values.
: 
: If I run this same search on a server with over a 1,000,000 pages in
: total, including the pages that are in the first example, it returns
: in about 2 minutes! still giving me 37 docs and the same amount of
: facet values.
: 
: Seems to me the search should have been constrained to
: field:www.example.com in both cases, so perhaps shouldn't be much
: different in time to execute.
: 
: Is there any more in formation on facet searching that will explain
: what's going on?

-Hoss

Re: Discarding undefined fields in query

2008-09-30 Thread Yonik Seeley

On Tue, Sep 30, 2008 at 2:42 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
> But still I have an error from the webapp when I try to query my
> schema with non existing fields in my query ( like foo:bar ).
>
> I'm wondering if the query q is parsed in a very simple way somewhere
> else (and independently from any customized QParserPlugin) and checked
> against the schema.

It should not be.  Are you sure your QParser is being used?
Does the error contain a stack trace that can pinpoint where it's coming from?

-Yonik

Re: Searching Question

2008-09-30 Thread Otis Gospodnetic

I hit ctrl-S by mistake.  This is the method you are after:

http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/DefaultSimilarity.html#tf(float)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 30, 2008 4:40:08 PM
> Subject: Re: Searching Question
> 
> The easiest thing is to look at Lucene javadoc and look for Similarity and 
> DefaultSimilarity classes.  Then have a peek at Lucene contrib to get some 
> other 
> examples of custom Similarity.  You'll just need to override one method, for 
> example:
> 
> 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Jake Conk 
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, September 30, 2008 3:11:01 PM
> > Subject: Re: Searching Question
> > 
> > How would I write a custom Similarity factor that overrides the TF
> > function? Is there some documentation on that somewhere?
> > 
> > On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll wrote:
> > >
> > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote:
> > >
> > >> It might be easiest to store the thread ID and the number of replies in
> > >> the thread in each post Document in Solr.
> > >
> > > Yeah, but that would mean updating every document in a thread every time a
> > > new reply is added.
> > >
> > > I still keep going back to the solution as putting all the replies in a
> > > single document, and then using a custom Similarity factor that overrides
> > > the TF function and/or the length normalization.  Still, this suffers from
> > > having to update the document for every new reply.
> > >
> > > Let's take a step back...
> > >
> > > Can I ask why you want the scoring this way?  What have you seen in your
> > > results that leads you to believe it is the correct way?  Note, I'm not
> > > trying to convince you it's wrong, I just want to better understand what's
> > > going on.
> > >
> > >
> > >>
> > >>
> > >> Otherwise it sounds like you'll have to combine some search results or
> > >> data post-search.
> > >>
> > >> Otis
> > >> --
> > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >>
> > >>
> > >>
> > >> - Original Message 
> > >>>
> > >>> From: Jake Conk 
> > >>> To: solr-user@lucene.apache.org
> > >>> Sent: Friday, September 26, 2008 1:50:37 PM
> > >>> Subject: Re: Searching Question
> > >>>
> > >>> Grant,
> > >>>
> > >>> Each post is its own document but I can merge them all into a single
> > >>> document under one  thread if that will allow me to do what I want.
> > >>> The number of replies is stored both in Solr and the DB.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> - JC
> > >>>
> > >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:
> > 
> >  Is a thread and all of it's posts a single document?  In other words,
> >  how
> >  are you modeling your posts as Solr documents?  Also, where are you
> >  keeping
> >  track of the number of replies?  Is that in Solr or in a DB?
> > 
> >  -Grant
> > 
> >  On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
> > 
> > > Hello,
> > >
> > > We are using Solr for our new forums search feature. If possible when
> > > searching for the word "Halo" we would like threads that contain the
> > > word "Halo" the most with the least amount of posts in that thread to
> > > have a higher score.
> > >
> > > For instance, if we have a thread with 10 posts and the word "Halo"
> > > shows up 5 times then that should have a lower score than a thread
> > > that has the word "Halo" 3 times within its posts and has 5 replies.
> > > Basically the thread that shows the search string most frequently
> > > amongst the number of posts in the thread should be the one with the
> > > highest score.
> > >
> > > Is something like this possible?
> > >
> > > Thanks,
> > >
> > >
> > >

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison


As a follow up: I continued tweaking the data-config.xml, and have been able
to make the commit fail with as little as 3 fields in the sdc.xml, with only
one multivalued field. Even more strange, some fields work and some do not.
For instance, in my dc.xml:


.
.
.


and in the schema.xml:

.
.
.

but taxon works and genpept does not. What could possibly account for this
discrepancy? Again, the error logs from the server are exactly that seen in
the first post.

What is going on?


KyleMorrison wrote:
> 
> Yes, this is the most recent version of Solr, stream="true" and stopwords,
> lowercase and removeDuplicate being applied to all multivalued fields?
> Would the filters possibly be causing this? I will not use them and see
> what happens.
> 
> Kyle
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> Hmm, strange.
>> 
>> This is Solr 1.3.0, right? Do you have any transformers applied to these
>> multi-valued fields? Do you have stream="true" in the entity?
>> 
>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
>> wrote:
>> 
>>>
>>> I apologize for spamming this mailing list with my problems, but I'm at
>>> my
>>> wits end. I'll get right to the point.
>>>
>>> I have an xml file which is ~1GB which I wish to index. If that is
>>> successful, I will move to a larger file of closer to 20GB. However,
>>> when I
>>> run my data-config(let's call it dc.xml) over it, the import only
>>> manages
>>> to
>>> get about 27 rows, out of roughly 200K. The exact same
>>> data-config(dc.xml)
>>> works perfectly on smaller data files of the same type.
>>>
>>> This data-config is quite large, maybe 250 fields. When I run a smaller
>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>>> perfectly. The only conclusion I can draw from this is that the
>>> data-config
>>> method just doesn't scale well.
>>>
>>> When the dc.xml fails, the server logs spit out:
>>>
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=95
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=77
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1

Re: Searching Question

2008-09-30 Thread Otis Gospodnetic

The easiest thing is to look at Lucene javadoc and look for Similarity and 
DefaultSimilarity classes.  Then have a peek at Lucene contrib to get some 
other examples of custom Similarity.  You'll just need to override one method, 
for example:


 --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jake Conk <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 30, 2008 3:11:01 PM
> Subject: Re: Searching Question
> 
> How would I write a custom Similarity factor that overrides the TF
> function? Is there some documentation on that somewhere?
> 
> On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll wrote:
> >
> > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote:
> >
> >> It might be easiest to store the thread ID and the number of replies in
> >> the thread in each post Document in Solr.
> >
> > Yeah, but that would mean updating every document in a thread every time a
> > new reply is added.
> >
> > I still keep going back to the solution as putting all the replies in a
> > single document, and then using a custom Similarity factor that overrides
> > the TF function and/or the length normalization.  Still, this suffers from
> > having to update the document for every new reply.
> >
> > Let's take a step back...
> >
> > Can I ask why you want the scoring this way?  What have you seen in your
> > results that leads you to believe it is the correct way?  Note, I'm not
> > trying to convince you it's wrong, I just want to better understand what's
> > going on.
> >
> >
> >>
> >>
> >> Otherwise it sounds like you'll have to combine some search results or
> >> data post-search.
> >>
> >> Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> - Original Message 
> >>>
> >>> From: Jake Conk 
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Friday, September 26, 2008 1:50:37 PM
> >>> Subject: Re: Searching Question
> >>>
> >>> Grant,
> >>>
> >>> Each post is its own document but I can merge them all into a single
> >>> document under one  thread if that will allow me to do what I want.
> >>> The number of replies is stored both in Solr and the DB.
> >>>
> >>> Thanks,
> >>>
> >>> - JC
> >>>
> >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:
> 
>  Is a thread and all of it's posts a single document?  In other words,
>  how
>  are you modeling your posts as Solr documents?  Also, where are you
>  keeping
>  track of the number of replies?  Is that in Solr or in a DB?
> 
>  -Grant
> 
>  On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
> 
> > Hello,
> >
> > We are using Solr for our new forums search feature. If possible when
> > searching for the word "Halo" we would like threads that contain the
> > word "Halo" the most with the least amount of posts in that thread to
> > have a higher score.
> >
> > For instance, if we have a thread with 10 posts and the word "Halo"
> > shows up 5 times then that should have a lower score than a thread
> > that has the word "Halo" 3 times within its posts and has 5 replies.
> > Basically the thread that shows the search string most frequently
> > amongst the number of posts in the thread should be the one with the
> > highest score.
> >
> > Is something like this possible?
> >
> > Thanks,
> >
> >
> >

Re: Integrating external stemmer in Solr and pre-processing text

2008-09-30 Thread Jaco

Hi,

The suggested approach with a TokenFilter extending the BufferedTokenStream
class works fine, performance is OK - the external stemmer is now invoked
only once for the complete search text. Also, from a functional point of
view, the approach is useful, because it allows for other filtering (i.e
WordDelimiterFilter with the various useful options) to be done before
stemming takes place.

Code is roughly like this for the process() function of the custom Filter
class:

protected Token process (Token token) {
StringBuilderstringBuilder = new StringBuilder();
TokennextToken;
IntegertokenPos = 0;
MaptokenMap = new LinkedHashMap();

stringBuilder.append(token.term()).append(' ');
tokenMap.put(tokenPos++, token);
nextToken= read();

while (nextToken != null)
{
stringBuilder.append(nextToken.term()).append(' ');
tokenMap.put(tokenPos++, nextToken);

nextToken= read();
}

StringinputText = stringBuilder.toString();
StringstemmedText   = stemText(inputText);
String[] stemmedWords= stemmedText.split("\\s");

for (Map.Entry entry : tokenMap.entrySet())
{
Integerpos= entry.getKey();
Tokentok = entry.getValue();

tok.setTermBuffer(stemmedWords[pos]);
write(tok);
}

return null;
}
}

This will need some work and additional error checking, and I'll probably
put a maximum om the number of tokens that is to be processed in one go to
make sure things don't get too big in memory.

Thanks for helping out!

Bye,

Jaco.



2008/9/26 Jaco <[EMAIL PROTECTED]>

> Thanks for these suggestions, will try it in the coming days and post my
> findings in this thread.
>
> Bye,
>
>
> Jaco.
>
> 2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>
>
>>
>> On Sep 26, 2008, at 12:05 PM, Jaco wrote:
>>
>>  Hi Grant,
>>>
>>> In reply to your questions:
>>>
>>> 1. Are you having to restart/initialize the stemmer every time for your
>>> "slow" approach?  Does that really need to happen?
>>>
>>> It is invoking a COM object in Windows. The object is instantiated once
>>> for
>>> a token stream, and then invoked once for each token. The invoke always
>>> has
>>> an overhead, not much to do about that (sigh...)
>>>
>>> 2. Can the stemmer return something other than a String?  Say a String
>>> array
>>> of all the stemmed words?  Or maybe even some type of object that tells
>>> you
>>> the original word and the stemmed word?
>>>
>>> The stemmer can only return a String. But, I do know that the returned
>>> string always has exactly the same number of words as the input string.
>>> So
>>> logically, it would be possible to :
>>> a) first calculate the position/start/end of each token in the input
>>> string
>>> (usual tokenization by Whitespace), resulting in token list 1
>>> b) then invoke the stemmer, and tokenize that result by Whitespace,
>>> resulting in token list 2
>>> c) 'merge' the token values of token list 2 into token list 1, which is
>>> possible because each token's position is the same in both lists...
>>> d) return that 'merged' token list 2 for further processing
>>>
>>> Would this work in Solr?
>>>
>>
>> I think so, assuming your stemmer tokenizes on whitespace as well.
>>
>>
>>>
>>> I can do some Java coding to achieve that from logical point of view, but
>>> I
>>> wouldn't know how to structure this flow into the MyTokenizerFactory, so
>>> some hints to achieve that would be great!
>>>
>>
>>
>> One thought:
>> Don't create an all in one Tokenizer.  Instead, keep the Whitespace
>> Tokenizer as is.  Then, create a TokenFilter that buffers the whole document
>> into a memory (via the next() implementation) and also creates, using
>> StringBuilder, a string containing the whole text.  Once you've read it all
>> in, then send the string to your stemmer, parse it back out and associate it
>> back to your token buffer.  If you are guaranteed position, you could even
>> keep a (linked) hash, such that it is really quick to look up tokens after
>> stemming.
>>
>> Pseudocode looks something like:
>>
>> while (token.next != null)
>>   tokenMap.put(token.position, token)
>>   stringBuilder.append(' ').append(token.text)
>>
>> stemmedText = comObj.stem(stringBuilder.toString())
>> correlateStemmedText(stemmedText, tokenMap)
>>
>> spit out the tokens one by one...
>>
>>
>> I think this approach should be fast (but maybe not as fast as your all in
>> one tokenizer) and will provide the correct position and offsets.  You do
>> have to be careful w/ really big documents, as that map can be big.  You
>> also want to be careful about map reuse, token reuse, etc.
>>
>> I believe there are a couple of buffering TokenFilters in Solr that you
>> could examine for inspiration.  I think the RemoveDuplicatesTokenFilter (or
>> whatever it's called) does buffer

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison


Yes, this is the most recent version of Solr, stream="true" and stopwords,
lowercase and removeDuplicate being applied to all multivalued fields? Would
the filters possibly be causing this? I will not use them and see what
happens.

Kyle


Shalin Shekhar Mangar wrote:
> 
> Hmm, strange.
> 
> This is Solr 1.3.0, right? Do you have any transformers applied to these
> multi-valued fields? Do you have stream="true" in the entity?
> 
> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote:
> 
>>
>> I apologize for spamming this mailing list with my problems, but I'm at
>> my
>> wits end. I'll get right to the point.
>>
>> I have an xml file which is ~1GB which I wish to index. If that is
>> successful, I will move to a larger file of closer to 20GB. However, when
>> I
>> run my data-config(let's call it dc.xml) over it, the import only manages
>> to
>> get about 27 rows, out of roughly 200K. The exact same
>> data-config(dc.xml)
>> works perfectly on smaller data files of the same type.
>>
>> This data-config is quite large, maybe 250 fields. When I run a smaller
>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>> perfectly. The only conclusion I can draw from this is that the
>> data-config
>> method just doesn't scale well.
>>
>> When the dc.xml fails, the server logs spit out:
>>
>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>> status=0
>> QTime=95
>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> INFO: Starting Full Import
>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>> deleteAll
>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> SEVERE: Full Import failed
>> java.util.ConcurrentModificationException
>>at
>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>> status=0
>> QTime=77
>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> INFO: Starting Full Import
>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>> deleteAll
>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> SEVERE: Full Import failed
>> java.util.ConcurrentModificationException
>>at
>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>
>> This mass of exceptions DOES NOT occur when I perform the same
>> full-import
>> with sdc.xml. As far as I can tell, the only difference between the two
>> files is the amount of fields they contain.
>>
>> Any guidance or information would be greatly appreciated.
>> Kyle
>>
>>
>> PS The schema.xml in use specifies almost all fields as multivalued, and
>> has
>> a copyfield for almost every field. I can fix this if it is causing my
>> problem, but I would prefer not to.
>> --
>> View this message in context:
>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p1974683

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread Shalin Shekhar Mangar

Hmm, strange.

This is Solr 1.3.0, right? Do you have any transformers applied to these
multi-valued fields? Do you have stream="true" in the entity?

On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote:

>
> I apologize for spamming this mailing list with my problems, but I'm at my
> wits end. I'll get right to the point.
>
> I have an xml file which is ~1GB which I wish to index. If that is
> successful, I will move to a larger file of closer to 20GB. However, when I
> run my data-config(let's call it dc.xml) over it, the import only manages
> to
> get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml)
> works perfectly on smaller data files of the same type.
>
> This data-config is quite large, maybe 250 fields. When I run a smaller
> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
> perfectly. The only conclusion I can draw from this is that the data-config
> method just doesn't scale well.
>
> When the dc.xml fails, the server logs spit out:
>
> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
> status=0
> QTime=95
> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
> deleteAll
> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> java.util.ConcurrentModificationException
>at
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
> status=0
> QTime=77
> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
> deleteAll
> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> java.util.ConcurrentModificationException
>at
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>
> This mass of exceptions DOES NOT occur when I perform the same full-import
> with sdc.xml. As far as I can tell, the only difference between the two
> files is the amount of fields they contain.
>
> Any guidance or information would be greatly appreciated.
> Kyle
>
>
> PS The schema.xml in use specifies almost all fields as multivalued, and
> has
> a copyfield for almost every field. I can fix this if it is causing my
> problem, but I would prefer not to.
> --
> View this message in context:
> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Searching Question

2008-09-30 Thread Jake Conk

How would I write a custom Similarity factor that overrides the TF
function? Is there some documentation on that somewhere?

On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote:
>
>> It might be easiest to store the thread ID and the number of replies in
>> the thread in each post Document in Solr.
>
> Yeah, but that would mean updating every document in a thread every time a
> new reply is added.
>
> I still keep going back to the solution as putting all the replies in a
> single document, and then using a custom Similarity factor that overrides
> the TF function and/or the length normalization.  Still, this suffers from
> having to update the document for every new reply.
>
> Let's take a step back...
>
> Can I ask why you want the scoring this way?  What have you seen in your
> results that leads you to believe it is the correct way?  Note, I'm not
> trying to convince you it's wrong, I just want to better understand what's
> going on.
>
>
>>
>>
>> Otherwise it sounds like you'll have to combine some search results or
>> data post-search.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>>
>>> From: Jake Conk <[EMAIL PROTECTED]>
>>> To: solr-user@lucene.apache.org
>>> Sent: Friday, September 26, 2008 1:50:37 PM
>>> Subject: Re: Searching Question
>>>
>>> Grant,
>>>
>>> Each post is its own document but I can merge them all into a single
>>> document under one  thread if that will allow me to do what I want.
>>> The number of replies is stored both in Solr and the DB.
>>>
>>> Thanks,
>>>
>>> - JC
>>>
>>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:

 Is a thread and all of it's posts a single document?  In other words,
 how
 are you modeling your posts as Solr documents?  Also, where are you
 keeping
 track of the number of replies?  Is that in Solr or in a DB?

 -Grant

 On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:

> Hello,
>
> We are using Solr for our new forums search feature. If possible when
> searching for the word "Halo" we would like threads that contain the
> word "Halo" the most with the least amount of posts in that thread to
> have a higher score.
>
> For instance, if we have a thread with 10 posts and the word "Halo"
> shows up 5 times then that should have a lower score than a thread
> that has the word "Halo" 3 times within its posts and has 5 replies.
> Basically the thread that shows the search string most frequently
> amongst the number of posts in the thread should be the one with the
> highest score.
>
> Is something like this possible?
>
> Thanks,
>
>
>

Re: Monitoring solr stats with munin?

2008-09-30 Thread Chris Hostetter

: > has anyone had the need and maybe already written a munin plugin to graph
: > some informations from e.g. admin/stats.jsp ?

: Something like that, though I havn't seen anything available publicly yet. Its

Anything exposed via stats.jsp should also be available via JMX (if you 
enable JMX) ... and agoogle seems to think there is a JMX plugin for munin 
(even though i don't really understand what munin is)

http://muninexchange.projects.linpro.no/?search&cid=38&pid=29

-Hoss

Re: Calculated Unique Key Field

2008-09-30 Thread Shalin Shekhar Mangar

On Wed, Oct 1, 2008 at 12:08 AM, Jim Murphy <[EMAIL PROTECTED]> wrote:

>
> Question1: Is this the best place to do this?


This sounds like a job for
http://wiki.apache.org/solr/UpdateRequestProcessor


-- 
Regards,
Shalin Shekhar Mangar.

Re: Applying Stop words for Field Type String

2008-09-30 Thread Chris Hostetter


: Question : Is it possible to do the same for String type or not, since the

StrField doesn't support an analyzer like TextField does, but if you 
define "string" to be a TextField using KeywordTokenizer it will preserve 
the whole value as a single token and you can then use the 
StopWordFilterFactory to throw out values which are stop words.

The stored value for a TextField and a StrField are returned to clients in 
exactly the same way.

-Hoss

Re: Dismax , "query phrases"

2008-09-30 Thread Chris Hostetter


: That's why I was wondering how Dismax breaks it all apart. It makes sense...I
: suppose what I'd like to have is a way to tell dismax which fields NOT to
: tokenize the input for. For these fields, it would pass the full q instead of
: each part of it. Does this make sense? would it be useful at all? 

the *goal* makes sense, but the implementation would be ... problematic.

you have to remember the DisMax parser's whole way of working is to make 
each "chunk" of input match against any qf field, and find the highest 
scoring field for each chunk, with this input...

q = some phase  & qf = a b c

...you get...

( (a:some | b:some | c:some) (a:phrase | b:phrase | c:phrase) )

...even if dismax could tell that "c" was a field that should only support 
exact matches, how would it fit c:"some phrase" into that structure?

I've already kinda forgotten how this thread started ... but would it make 
sense to just use your "exact" fields in the pf, and have inexact versions 
of them in the qf?  then docs that match your input exactly should score 
at the top, but less exact matches will also still match.



-Hoss

Re: Calculated Unique Key Field

2008-09-30 Thread Jim Murphy


It may not be all that relevant but our Update handler extends from
DirectUpdateHandler2.
-- 
View this message in context: 
http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19748032.html
Sent from the Solr - User mailing list archive at Nabble.com.

Discarding undefined fields in query

2008-09-30 Thread Jérôme Etévé

Hi All,

  I wrote a customized query parser which discards non-schema fields
from the query (I'm using the schema field names  from
req.getSchema().getFields().keySet() ) .

This parser works fine in unit tests.

But still I have an error from the webapp when I try to query my
schema with non existing fields in my query ( like foo:bar ).

I'm wondering if the query q is parsed in a very simple way somewhere
else (and independently from any customized QParserPlugin) and checked
against the schema.

Is there an option to modify this behaviour so undefined fields in a
query could be simply discarded instead of throwing an error ?

Cheers !

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Calculated Unique Key Field

2008-09-30 Thread Jim Murphy


My unique key field is an MD5 hash of several other fields that represent
identity of documents in my index.  We've been calculating this externally
and setting the key value in documents but have found recurring bugs as the
number and variety of inserting consumers has grown...

So I wanted to move to calculating these at "add" time.  We already have our
own UpdateHandler, extending from DirectUpdateHandler, so I extended its
addDoc method to do the hashing and field setting.  

Here's the implementation highlights:

String postGuid = 

// set the value - overwrite if already present
{
  SolrInputField postGuidField = doc.getField(POST_GUID_NAME);
  if (postGuidField != null)
  {
postGuidField.setValue(postGuid, DEFAULT_BOOST);
  }
  else
  {
doc.addField(POST_GUID_NAME, postGuid);
  }
}

{

  // add guid field to the lucene doc too - huh. 
  Document lucDoc = cmd.getLuceneDocument(schema);

  Field aiPostGuidField = lucDoc.getField(POST_GUID_NAME);
  if (aiPostGuidField != null)
  {
aiPostGuidField.setValue(postGuid);
  }
  else
  {
SchemaField aiPostGuidSchemaField = schema.getField(POST_GUID_NAME);
Field postGuidField = aiPostGuidSchemaField.createField(postGuid,
DEFAULT_BOOST);
lucDoc.add(postGuidField);
  }
}


Question1: Is this the best place to do this?
Question2: Is there a way around adding it to both the SolrDocument and the
Lucene Document?

Thoughts?

Best regards,

Jim
-- 
View this message in context: 
http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19747955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: French synonyms & Online synonyms

2008-09-30 Thread Pierre Auslaender

True, synonyms can be grouped in cliques based on the strength of their 
"resemblence" given a specific context.


But what I'm indexing is the text content of TV programs produced by a 
public television, so the context is very large and non-specific. What I 
want is to find "automobile" for "car", "motorcycle" for "bike", "pub" 
for "restaurant", "woman" for "lady", and the likes.


There actually are free on-line resources for most European languages 
(of course, English included), check these out:

http://dico.isc.cnrs.fr/dico_html/en/index.html
http://www.crisco.unicaen.fr/alexandria2.html

Would you mind commenting on the following plan for a special synonym 
analyzer.

1/ We would start with an empty synonyms file.
2/ For each indexing request, the analyser looks up the file for 
synonyms. If it finds synonyms, it proceeds normally.
3/ Otherwise, it checks an online resource for synonyms, updates the 
synonyms file, and proceeds.


If you think this is workable, there are two problems left: which terms 
to look up for online synonyms, and how to select the "synonymity" clique.


For the first issue, I would definitely only search for synonyms of 
nouns, verbs and adjectives, so some stemming is required initially.
For the second issue, I'd have a cut-off value for the strength of 
"resemblence", if this information is available, or / and use the 
frequency of the synonyms in the SOLR index as a measure.


Building the synonyms file that way would make the system quicker over 
time, and for a specific domain (chemistry, biology, sports, etc) the 
process would be auto-adaptive - perhaps with some human help from time 
to time.


Thanks,
Pierre

Walter Underwood a écrit :

Synonyms are domain-specific, so general-purpose lists are not very useful.

Ultraseek shipped a British-American synonym list as an example, but even
that wasn't very general. One of our customers was a chemical company and
was very surprised when the search "rocket fuel" suggested "arugula",
even though "rocket" is a perfectly good synonym for "arugula".

wunder

On 9/30/08 10:14 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

  

Pierre,

1) I don't know, but a good place to check and see what previous answers to
this questions were is markmail.org
2) I don't think there is such a thing, but I also don't think there are sites
that make this data freely available (answer to 1?)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: Pierre Auslaender <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, September 30, 2008 11:28:40 AM
Subject: French synonyms & Online synonyms

Hello,

I'm sure these questions have been raised a million times, I'll try one
more:

1/ Is there any general-purpose, free, French synonyms file out there?

2/ Is there a Solr or Lucene analyser class that could tap an on-line
resource for synoynms at index-time? And by the same token, maintain and
complete a synoynms text file?

Thanks for the great work on SOLR and for the liveliness of this list.

Pierre

Re: commit not fired

2008-09-30 Thread Chris Hostetter


: When I check my commit.log nothings is runned 

commit.log is only updated by the bin/commit script ... not by Solr 
itself.  you'll see Solr log commits in whatever logs are kept by your 
servlet container.

: My snapshooter too: but no log in snapshooter.log
: 
: 
:   ./data/solr/book/logs/snapshooter

I believe Shalin or Bill already commented on this in another thread ... 
those paths really don't look right.




-Hoss

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread Mark Miller



Exception indicates a threading bug, not a scaling issue...

I'm sure the issue will be illuminated on soon though.

KyleMorrison wrote:

I apologize for spamming this mailing list with my problems, but I'm at my
wits end. I'll get right to the point.

I have an xml file which is ~1GB which I wish to index. If that is
successful, I will move to a larger file of closer to 20GB. However, when I
run my data-config(let's call it dc.xml) over it, the import only manages to
get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml)
works perfectly on smaller data files of the same type.

This data-config is quite large, maybe 250 fields. When I run a smaller
data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
perfectly. The only conclusion I can draw from this is that the data-config
method just doesn't scale well.

When the dc.xml fails, the server logs spit out:

Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=95
Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=77
Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)

This mass of exceptions DOES NOT occur when I perform the same full-import
with sdc.xml. As far as I can tell, the only difference between the two
files is the amount of fields they contain.

Any guidance or information would be greatly appreciated.
Kyle


PS The schema.xml in use specifies almost all fields as multivalued, and has
a copyfield for almost every field. I can fix this if it is causing my
problem, but I would prefer not to.

Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison


I apologize for spamming this mailing list with my problems, but I'm at my
wits end. I'll get right to the point.

I have an xml file which is ~1GB which I wish to index. If that is
successful, I will move to a larger file of closer to 20GB. However, when I
run my data-config(let's call it dc.xml) over it, the import only manages to
get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml)
works perfectly on smaller data files of the same type.

This data-config is quite large, maybe 250 fields. When I run a smaller
data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
perfectly. The only conclusion I can draw from this is that the data-config
method just doesn't scale well.

When the dc.xml fails, the server logs spit out:

Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=95
Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=77
Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)

This mass of exceptions DOES NOT occur when I perform the same full-import
with sdc.xml. As far as I can tell, the only difference between the two
files is the amount of fields they contain.

Any guidance or information would be greatly appreciated.
Kyle


PS The schema.xml in use specifies almost all fields as multivalued, and has
a copyfield for almost every field. I can fix this if it is causing my
problem, but I would prefer not to.
-- 
View this message in context: 
http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: French synonyms & Online synonyms

2008-09-30 Thread Walter Underwood

Synonyms are domain-specific, so general-purpose lists are not very useful.

Ultraseek shipped a British-American synonym list as an example, but even
that wasn't very general. One of our customers was a chemical company and
was very surprised when the search "rocket fuel" suggested "arugula",
even though "rocket" is a perfectly good synonym for "arugula".

wunder

On 9/30/08 10:14 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Pierre,
> 
> 1) I don't know, but a good place to check and see what previous answers to
> this questions were is markmail.org
> 2) I don't think there is such a thing, but I also don't think there are sites
> that make this data freely available (answer to 1?)
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Pierre Auslaender <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, September 30, 2008 11:28:40 AM
>> Subject: French synonyms & Online synonyms
>> 
>> Hello,
>> 
>> I'm sure these questions have been raised a million times, I'll try one
>> more:
>> 
>> 1/ Is there any general-purpose, free, French synonyms file out there?
>> 
>> 2/ Is there a Solr or Lucene analyser class that could tap an on-line
>> resource for synoynms at index-time? And by the same token, maintain and
>> complete a synoynms text file?
>> 
>> Thanks for the great work on SOLR and for the liveliness of this list.
>> 
>> Pierre
>

Re: French synonyms & Online synonyms

2008-09-30 Thread Otis Gospodnetic

Pierre,

1) I don't know, but a good place to check and see what previous answers to 
this questions were is markmail.org
2) I don't think there is such a thing, but I also don't think there are sites 
that make this data freely available (answer to 1?)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Pierre Auslaender <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 30, 2008 11:28:40 AM
> Subject: French synonyms & Online synonyms
> 
> Hello,
> 
> I'm sure these questions have been raised a million times, I'll try one 
> more:
> 
> 1/ Is there any general-purpose, free, French synonyms file out there?
> 
> 2/ Is there a Solr or Lucene analyser class that could tap an on-line 
> resource for synoynms at index-time? And by the same token, maintain and 
> complete a synoynms text file?
> 
> Thanks for the great work on SOLR and for the liveliness of this list.
> 
> Pierre

French synonyms & Online synonyms

2008-09-30 Thread Pierre Auslaender


Hello,

I'm sure these questions have been raised a million times, I'll try one 
more:


1/ Is there any general-purpose, free, French synonyms file out there?

2/ Is there a Solr or Lucene analyser class that could tap an on-line 
resource for synoynms at index-time? And by the same token, maintain and 
complete a synoynms text file?


Thanks for the great work on SOLR and for the liveliness of this list.

Pierre

Re: Howto concatenate tokens at index time (without spaces)

2008-09-30 Thread Otis Gospodnetic

I haven't used the German analyzer (either Snowball or the one we have in 
Lucene's contrib), but have you checked if that does the trick of keeping words 
together?  Or maybe the compound tokenizer has this option? (check Lucene JIRA, 
not sure now where the compound tokenizer went)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Batzenmann <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 30, 2008 7:28:53 AM
> Subject: Howto concatenate tokens at index time (without spaces)
> 
> 
> Hi,
> 
> I'm looking for a way to create a fieldtype which will apart from the
> whitespacedtokenized tokens also store concatenated versions of the tokens.
> 
> The ShingleFilter does s.th. very similar but keeps spaces in between words.
> In german a shoe(Schuh) you wear in your 'spare time'(Freizeit) is actually
> a "Freizeitschuh" and not a "Freizeit Schuh".
> The WorddelimterFactory could be incorporated for this as well if the space
> character could be configured as delimiter-character - but these cant be
> configured at all or am I wrong?
> 
> Synonyms are in my opinion not the solution for this, as this it is imho
> aboslutely not neccessary to persist any data for this requirement.
> 
> cheers, Axel
> -- 
> View this message in context: 
> http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19740271.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Multiple Fields with the Same Name

2008-09-30 Thread KyleMorrison


That was indeed the error, I apologize for wasting your time. Thank you very
much for the help.
Kyle



Shalin Shekhar Mangar wrote:
> 
> Is that a mis-spelling?
> 
> mulitValued="true"
> 
> On Thu, Sep 25, 2008 at 12:12 AM, KyleMorrison <[EMAIL PROTECTED]> wrote:
> 
>>
>> I'm trying to index fields as such:
>>6100966
>>375010
>>2338917
>>1943701
>>1357528
>>3301821
>>2450046
>>8940112
>>6251457
>>293
>>6262769
>>2693214
>>2839489
>>6283093
>>2666401
>>6343085
>>1721838
>>6377309
>>3882429
>>6302075
>>
>> And in the xml schema we see
>>   > stored="false"
>> mulitValued="true"/>
>>
>> However, when I search for entries in PMID, the only one that ever gets
>> stored is the last one in the list. For instance, q=PMID:6302075 returns
>> a
>> document, whereas q=PMID:3882429 does not. Shouldn't the data import
>> handler
>> take care of this, or am I misunderstanding the function of
>> mulitValued="true"?
>>
>> Kyle
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19743517.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheck: buildOnOptimize?

2008-09-30 Thread Jason Rennie

On Fri, Sep 26, 2008 at 9:33 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> Jason, can you please open a jira issue to add this feature?
>

Done.

https://issues.apache.org/jira/browse/SOLR-795

Jason

spellcheck: substitutions, but no inserts or deletes

2008-09-30 Thread Jason Rennie

I've been testing the SpellCheckComponent for use on StyleFeeder.  It seems
to do a great job of suggesting character substitutions, but I haven't seen
any deletion/insertion suggestions.  I've tried decreasing the "accuracy"
parameter to 0.5.  Some queries I've tried are:

bluea: suggests "blues" (should be "blue")
yello: no suggestions (should be "yellow")
candyz: suggests "candyâ" (should be "candy")
chane: no suggestions (should be "chanel")

It looks to me like it is only willing to make character substitutions and
is unwilling to insert/delete characters.  Does anyone know why it might be
behaving this way?  I'm certain that the "should be" words appear fairly
frequently in the field I used for spellcheck indexing.  And, I reindexed
the documents after setting up the spellchecker.

Not sure if this would help to debug, but I noticed that words appear with
different frequency in the spellcheck index file (.cfs in the spellcheck
dir).  I.e. here's what I get for a few variants on "blue":

[EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^blue$|wc
 46  46 230
[EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^bluea$|wc
  0   0   0
[EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^blues$|wc
  3   3  18

All the "should be" words appear 10+ times.  The misspellings appear 0 or 1
times.

Any help is appreciated.  Thanks,

Jason

Howto concatenate tokens at index time (without spaces)

2008-09-30 Thread Batzenmann


Hi,

I'm looking for a way to create a fieldtype which will apart from the
whitespacedtokenized tokens also store concatenated versions of the tokens.

The ShingleFilter does s.th. very similar but keeps spaces in between words.
In german a shoe(Schuh) you wear in your 'spare time'(Freizeit) is actually
a "Freizeitschuh" and not a "Freizeit Schuh".
The WorddelimterFactory could be incorporated for this as well if the space
character could be configured as delimiter-character - but these cant be
configured at all or am I wrong?

Synonyms are in my opinion not the solution for this, as this it is imho
aboslutely not neccessary to persist any data for this  requirement.

cheers, Axel
-- 
View this message in context: 
http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19740271.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Running Solr1.3 with multicore support

2008-09-30 Thread RaghavPrabhu


Hi Saurabh Bhutyani,

  Is it show the two core links in ur solr home page like

 Admin core0
 Admin core1

 if not,the problem is you are upgrading the solr from 1.2 to 1.3. Better
stop the server delete all the floders in
%Tomcat_Home%\work\Catalina\localhost location and  restart it. Hope it will
work.



Regards
Prabhu.K

-- 
View this message in context: 
http://www.nabble.com/Running-Solr1.3-with-multicore-support-tp19722268p19739928.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about facet.prefix usage

2008-09-30 Thread Erik Hatcher

If I'm not mistaken, doesn't facet.query accomplish what you want?

Erik

On Sep 29, 2008, at 5:43 PM, Simon Hu wrote:

I also need the exact same feature. I was not able to find an easy
solution
and ended up modifying class SimpleFacets to make it accept an array
of

facet prefixes per field. If you are interested, I can email you the
modified SimpleFacets.java.

-Simon

steve berry-2 wrote:

Question: Is it possible to pass complex queries to facet.prefix?
Example instead of facet.prefix:foo I want facet.prefix:foo OR
facet.prefix:bar

My application is for browsing business records that fall into
categories. The user is only allowed to see businesses falling into
categories which they have access to.

I have a series of documents dumped into the following basic
structure

which I was hoping would help me deal with this:

123
Business Corp.
28255-0001
.
charlotte_2006 Banks
charlotte_2007 Banks
sanfrancisco_2006 Banks
sanfrancisco_2007 Banks
... (lots more market_category entries) ...

124
Factory Corp.
28205-0001
.
charlotte_2006 Banks
charlotte_2007 Banks
austin_2006 Banks
austin_2007 Banks
... (lots more market_category entries) ...

The multivalued market_category fields are flattened relational data
attributed to that business and I want to use those values for facted
navigation /but/ I want the facets to be restricted depending on what
products the user has access to. For example a user may have access
to

sanfrancisco_2007 and sanfrancisco_2006 data but nothing else.

So I've created a request using facet.prefix that looks something
like

this:
http://SOLRSERVER:8080/solr/select?q.op=AND&q=docType:gen&facet.field=market_category&facet.prefix=charlotte_2007

This ends up producing perfectly suitable facet results that look
like

this:
..

1
1
1
1
1
1
0

Bingo! facet.prefix does exactly what I want it to.

Now I want to go a step further and pass a compound statement to the
facet.prefix along the lines of "facet.prefix:charlotte_2007 OR
sanfrancisco_2007" or "facet.prefix:charlotte_2007 OR
charlotte_2006" to
return more complex facet sets. As far as I can tell looking at the
docs

this won't work.

Is this possible using the existing facet.prefix functionality?
Anyone

have a better idea of how I should accomplish this?

Thanks,
steve berry
American City Business Journals

--
View this message in context:
http://www.nabble.com/Question-about-facet.prefix-usage-tp15836501p19732310.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Extending Solr with custom filter

2008-09-30 Thread Jarek Zgoda

Wiadomość napisana w dniu 2008-09-12, o godz. 17:58, przez Andrzej  
Bialecki:



ok .. that?

 

  




I recommend using Stempelator (or Morfologik) for Polish stemming  
and lemmatization. It provides a superset of Stempel features,  
namely in addition to the algorithmic stemming it provides a  
dictionary-based stemming, and these two methods nicely complement  
each other.



I'm not familiar with Java enough to do anything more complicated than  
write some wrapping factory class. Stempel seems to have such classes  
to wrap, but I did not found any Lucene analyzer that uses Morfologik.  
Or am I completely wrong and it should be plugged into Solr in  
completely different way?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]

38 matches

Mail list logo