Re: Using DIH's special commands....Help needed

2009-10-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
The accepted logLevel values are
error, deubug,warn,trace,info

2009/10/18 Noble Paul നോബിള്‍  नोब्ळ् :
> On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog  wrote:
>> I had this problem also, but I was using the Jetty exampl. I fail at
>> logging configurations about 90% of the time, so I assumed it was my
>> fault.
> did you set the logLevel atribute also in the entity? if you set
> logLevel="severe" it should definitely be printed
>>
>> 2009/10/17 Noble Paul നോബിള്‍  नोब्ळ् :
>>> It is strange that LogTransformer did not log the data. .
>>>
>>> On Fri, Oct 16, 2009 at 5:54 PM, William Pierce  
>>> wrote:
 Folks:

 Continuing my saga with DIH and use of its special commands.  I have
 verified that the script functionality is indeed working.    I also 
 verified
 that '$skipRow' is working.    But I don't think that '$deleteDocById' is
 working.

 My script now looks as follows:

 
        
  

 The theory is that rows whose 'IndexingStatus' value is 4 should be deleted
 from solr index.  Just to be sure that javascript syntax was correct and
 checked out,  I intentionally overwrite a field called 'Col1' in my schema
 with primary key of the document to be deleted.

 On a clean and empty index, I import 47 rows from my dummy db.   Everything
 checks out correctly since IndexingStatus for each row is 1.  There are no
 rows to delete.    I then go into the db and set one row with the
 IndexingStatus = 4.   When I execute the dataimport,  I find that all 47
 documents are imported correctly.   However,  for the row for which
 'IndexingStatus' was set to 4,  the Col1 value is set correctly by the
 script transformer to be the primary key value for that row/document.
 However,  I should not be seeing that document  since the '$deleteDocById
 should have deleted this from solr.

 Could this be a bug in solr?  Or, am I misunderstanding how $deleteDocById
 works?

 By the way, Noble, I tried to set the LogTransformer, and add logging per
 your suggestion.  That did not work either.  I set logLevel="debug", and
 also turned on solr logging in the admin console to be the max value
 (finest) and still no output.

 Thanks,

 - Bill



 --
 From: "Noble Paul ???  ??" 
 Sent: Thursday, October 15, 2009 10:05 PM
 To: 
 Subject: Re: Using DIH's special commandsHelp needed

> use  LogTransformer to see if the value is indeed set
>
>          logTemplate="${post}"
>         query=" select  Id, a, b, c, IndexingStatus from  prod_table
> where (IndexingStatus = 1 or IndexingStatus = 4) ">
>
> this should print out the entire row after the transformations
>
>
>
> On Fri, Oct 16, 2009 at 3:04 AM, William Pierce 
> wrote:
>>
>> Thanks for your reply!  I tried your suggestion.  No luck.  I have
>> verified
>> that I have version  1.6.0_05-b13 of java installed.  I am running with
>> the
>> nightly bits of October 7.  I am pretty much out of ideas at the present
>> timeI'd appreciate any tips/pointers.
>>
>> Thanks,
>>
>> - Bill
>>
>> --
>> From: "Shalin Shekhar Mangar" 
>> Sent: Thursday, October 15, 2009 1:42 PM
>> To: 
>> Subject: Re: Using DIH's special commandsHelp needed
>>
>>> On Fri, Oct 16, 2009 at 12:46 AM, William Pierce
>>> wrote:
>>>
 Thanks for your help.  Here is my DIH config fileI'd appreciate any
 help/pointers you may give me.  No matter what I do the documents are
 not
 getting deleted from the index.  My db has rows whose 'IndexingStatus'
 field
 has values of either 1 (which means add it to solr), or 4 (which means
 delete the document with the primary key from SOLR index).  I have two
 transformers running.  Not sure what I am doing wrong.

 
  

Is Relational Mapping (foreign key) possible in solr ??

2009-10-19 Thread ashokcz

Hi i browsed through the solr docs and user forums and what i infer is we
cant use solr to store Relational 
Mapping(foreign key) in solr .

but just want to know if any chances of doing the same.

I have two tables User table (with 1,00,000 entries )  and project table
with (200 entries ).
User table columns : userid , name ,country , location , etc.
Project tables Columns : project name , description , business unit ,
project type .
Here User Location , Country , Project  Name , Project  business unit ,
project type are faceted
A user can be mapped to multiple projects.
In solr i store the details like this 
[
{
userId:1234;
userName:ABC;
Country:US;
Location:NY;
Project Name:Project1,Project2;
Project Description:Project1,Project2;
Project  business unit:unit1,unit2;
Project type:Type1,Type2
}
]

With this structure i could get faceted details about both user data and
project data .

But here i face 2 Problems .

1.A project can be mapped to many users say 10,000 Users . So if i change a
project name then i end 
up indexing 10,000 Records which is a very time consuming work.

2.for Fields like Project Description i could not find any proper delimiter
. for other fields comma (,) is 

okay but being description i could not use any specific delimiter .This is
not faceted but still in search results i need to take this out and show the
project details in tabular format. and i use delimiter to split it .For
other  project fields like Project Name and Type i could do it but not for
this Project Description field 

So i expect is there any way of storing the data as relational records like
in user details where we will have field called project Id and data will be
1,2 which refers to project records primary key in solr and still preserve
the faceted approach.

As for my knowledge my guess is it cant be done ???
Am i correct ???
If so then how we can achieve the solutions to my problem??
Pls if someone could share some ideas it will be useful.
-- 
View this message in context: 
http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is Relational Mapping (foreign key) possible in solr ??

2009-10-19 Thread Jérôme Etévé
Hi,

 here's what you could do:

* Use multivalued fields instead of 'comma separated values', so you
won't need a separator.
* Store project identifiers in the user index.

Denormalised projects informations in a user entry will fatally need
re-indexing lot of user entries when project info changes.

* You could have a mixed index with user and project entries in the
same index, so if you search for a name, you'd find users and projects
matching that name.

Jerome.

2009/10/19 ashokcz :
>
> Hi i browsed through the solr docs and user forums and what i infer is we
> cant use solr to store Relational
> Mapping(foreign key) in solr .
>
> but just want to know if any chances of doing the same.
>
> I have two tables User table (with 1,00,000 entries )  and project table
> with (200 entries ).
> User table columns : userid , name ,country , location , etc.
> Project tables Columns : project name , description , business unit ,
> project type .
> Here User Location , Country , Project  Name , Project  business unit ,
> project type are faceted
> A user can be mapped to multiple projects.
> In solr i store the details like this 
> [
> {
> userId:1234;
> userName:ABC;
> Country:US;
> Location:NY;
> Project Name:Project1,Project2;
> Project Description:Project1,Project2;
> Project  business unit:unit1,unit2;
> Project type:Type1,Type2
> }
> ]
>
> With this structure i could get faceted details about both user data and
> project data .
>
> But here i face 2 Problems .
>
> 1.A project can be mapped to many users say 10,000 Users . So if i change a
> project name then i end
> up indexing 10,000 Records which is a very time consuming work.
>
> 2.for Fields like Project Description i could not find any proper delimiter
> . for other fields comma (,) is
>
> okay but being description i could not use any specific delimiter .This is
> not faceted but still in search results i need to take this out and show the
> project details in tabular format. and i use delimiter to split it .For
> other  project fields like Project Name and Type i could do it but not for
> this Project Description field
>
> So i expect is there any way of storing the data as relational records like
> in user details where we will have field called project Id and data will be
> 1,2 which refers to project records primary key in solr and still preserve
> the faceted approach.
>
> As for my knowledge my guess is it cant be done ???
> Am i correct ???
> If so then how we can achieve the solutions to my problem??
> Pls if someone could share some ideas it will be useful.
> --
> View this message in context: 
> http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net


Terms truncation

2009-10-19 Thread Paul Forsyth

Hi,

I'm using the terms component for an autosuggest feature and it works  
well but i've hit an issue with truncation:


Take the following query:

http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch

This is the response:




0
1





35
7




In this case the word 'switchov' is returned where i expected  
'switchover'. The word 'switchov' doesn't exist by itself


I'm puzzled with the truncation. The handles are all standard for 1.4.

Are other factors affecting the response? I couldn't see an  
appropriate option for the query to adjust the length of the returned  
string...


Thanks in advance,

Paul Forsyth




Re: Terms truncation

2009-10-19 Thread Grant Ingersoll


On Oct 19, 2009, at 6:23 AM, Paul Forsyth wrote:


Hi,

I'm using the terms component for an autosuggest feature and it  
works well but i've hit an issue with truncation:


Take the following query:

http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch

This is the response:




0
1





35
7




In this case the word 'switchov' is returned where i expected  
'switchover'. The word 'switchov' doesn't exist by itself


I'm guessing you are asking for terms on a field that is stemmed.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Concatening two fields

2009-10-19 Thread sophSophie

Hello,

firstly sorry for my english :)

Since last Friday I try to define in shema.xml a new field that is the
concatenation of two other fields.

So in schemal.xml I have these fields :

 
   

field3




In my .csv file date are stored like that :

field1 ; field2
toto ; titi

In my mind field3 should store the string "toto titi".

When I make the query "toto titi" I want solr to return the correct result
but Solr returns nothing.

Please could you help me to find what is uncorrect.

Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Concatening-two-fields-tp25956649p25956649.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using DIH's special commands....Help needed

2009-10-19 Thread William Pierce

Lance, Noble:

I set logLevel="debug" in my dihconfig.xml at the entity level.   Got no 
output!   I then gave up digging into this further because I was pressed for 
time to dig into how to increase the speed of importing into solr with 
dih...


Cheers,

- Bill
--
From: "Noble Paul നോബിള്‍  नोब्ळ्" 
Sent: Monday, October 19, 2009 1:05 AM
To: 
Subject: Re: Using DIH's special commandsHelp needed


The accepted logLevel values are
error, deubug,warn,trace,info

2009/10/18 Noble Paul നോബിള്‍  नोब्ळ् :

On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog  wrote:

I had this problem also, but I was using the Jetty exampl. I fail at
logging configurations about 90% of the time, so I assumed it was my
fault.

did you set the logLevel atribute also in the entity? if you set
logLevel="severe" it should definitely be printed


2009/10/17 Noble Paul നോബിള്‍  नोब्ळ् :

It is strange that LogTransformer did not log the data. .

On Fri, Oct 16, 2009 at 5:54 PM, William Pierce  
wrote:

Folks:

Continuing my saga with DIH and use of its special commands.  I have
verified that the script functionality is indeed working.I also 
verified
that '$skipRow' is working.But I don't think that '$deleteDocById' 
is

working.

My script now looks as follows:


   
 

The theory is that rows whose 'IndexingStatus' value is 4 should be 
deleted
from solr index.  Just to be sure that javascript syntax was correct 
and
checked out,  I intentionally overwrite a field called 'Col1' in my 
schema

with primary key of the document to be deleted.

On a clean and empty index, I import 47 rows from my dummy db. 
Everything
checks out correctly since IndexingStatus for each row is 1.  There 
are no

rows to delete.I then go into the db and set one row with the
IndexingStatus = 4.   When I execute the dataimport,  I find that all 
47

documents are imported correctly.   However,  for the row for which
'IndexingStatus' was set to 4,  the Col1 value is set correctly by the
script transformer to be the primary key value for that row/document.
However,  I should not be seeing that document  since the 
'$deleteDocById

should have deleted this from solr.

Could this be a bug in solr?  Or, am I misunderstanding how 
$deleteDocById

works?

By the way, Noble, I tried to set the LogTransformer, and add logging 
per
your suggestion.  That did not work either.  I set logLevel="debug", 
and

also turned on solr logging in the admin console to be the max value
(finest) and still no output.

Thanks,

- Bill



--
From: "Noble Paul ???  ??" 
Sent: Thursday, October 15, 2009 10:05 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


use  LogTransformer to see if the value is indeed set



this should print out the entire row after the transformations



On Fri, Oct 16, 2009 at 3:04 AM, William Pierce 


wrote:


Thanks for your reply!  I tried your suggestion.  No luck.  I have
verified
that I have version  1.6.0_05-b13 of java installed.  I am running 
with

the
nightly bits of October 7.  I am pretty much out of ideas at the 
present

timeI'd appreciate any tips/pointers.

Thanks,

- Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 1:42 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


On Fri, Oct 16, 2009 at 12:46 AM, William Pierce
wrote:

Thanks for your help.  Here is my DIH config fileI'd 
appreciate any
help/pointers you may give me.  No matter what I do the documents 
are

not
getting deleted from the index.  My db has rows whose 
'IndexingStatus'

field
has values of either 1 (which means add it to solr), or 4 (which 
means
delete the document with the primary key from SOLR index).  I have 
two

transformers running.  Not sure what I am doing wrong.


 

 
 
 query=" select  Id, a, b, c, IndexingStatus from 
prod_table

where (IndexingStatus = 1 or IndexingStatus = 4) ">
 
 
 
 
 



One thing I'd try is to use '4' for comparison rather than the 
number 4

(the
type would depend on the sql type). Also, for javascript 
transformers to
work, you mu

MoreLikeThis support Dismax parameters

2009-10-19 Thread Nick Spacek
>From what I've read/found, MoreLikeThis doesn't support the dismax
parameters that are available in the StandardRequestHandler (such as bq). Is
it possible that we might get support for those parameters some time? What
are the issues with MLT Handler inheriting from the StandardRequestHandler
instead of RequestHandlerBase?

Nick Spacek


Re: Terms truncation

2009-10-19 Thread Paul Forsyth

Thanks Grant,

I'm still a bit of a newbie with Solr :)

I was able to add a new non-stemming field along with a copyfield, and  
that seems to have done the trick :)


Until i tried this i didnt quite realise what copyfields did...

Thanks again,

Paul

On 19 Oct 2009, at 11:23, Paul Forsyth wrote:


Hi,

I'm using the terms component for an autosuggest feature and it  
works well but i've hit an issue with truncation:


Take the following query:

http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch

This is the response:




0
1





35
7




In this case the word 'switchov' is returned where i expected  
'switchover'. The word 'switchov' doesn't exist by itself


I'm puzzled with the truncation. The handles are all standard for 1.4.

Are other factors affecting the response? I couldn't see an  
appropriate option for the query to adjust the length of the  
returned string...


Thanks in advance,

Paul Forsyth








Boost with wildcard.

2009-10-19 Thread Jay Ess
The boost (index time) does not work when i am searching for a word with 
a wildcard appended to the end.

I stumbled on to this "feature" and its pretty much a show stopper for me.
I am implementing a live search feature where i always have an wildcard 
in the last word that is currently being written by the user.

Will this be fixed anytime soon or does anyone have a workaround?

Example :
"playstation*" gives an result with unboosted items but "playstation" 
gives the correct one.




Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-19 Thread gdeconto

awesome.  Thanks for figuring this out guys



wojtekpia wrote:
> 
> Good catch. I was testing on a nightly build from mid-July. I just tested
> on a similar deployment with nightly code from Oct 5th and everything
> seems to work. 
> 
> My mid-July deployment breaks on sints, integers, sdouble, doubles, slongs
> and longs. My more recent deployment works with tints, sints, integers,
> tdoubles, sdoubles, doubles, tlongs, slongs, and longs. (I don't have any
> floats in my schema so I didn't test those). Sounds like another reason to
> upgrade to 1.4.
> 
> Wojtek
> 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25959707.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Concatening two fields

2009-10-19 Thread Grant Ingersoll


On Oct 19, 2009, at 7:21 AM, sophSophie wrote:



Hello,

firstly sorry for my english :)

Since last Friday I try to define in shema.xml a new field that is the
concatenation of two other fields.

So in schemal.xml I have these fields :


  

field3




In my .csv file date are stored like that :

field1 ; field2
toto ; titi

In my mind field3 should store the string "toto titi".

When I make the query "toto titi" I want solr to return the correct  
result

but Solr returns nothing.


How is Field3 defined?  And is that a phrase query or are you just  
using quotes for separation/emphasis?


-Grant 


Solr commits before documents are added

2009-10-19 Thread Ranganathan, Sharmila
Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();  // Commit to Solr

 

 

Thanks,

Sharmila



RE: Solr commits before documents are added

2009-10-19 Thread Feak, Todd
A few questions to help the troubleshooting.

Solr version #?

Is there just 1 commit through Solrj for the millions of documents? 

Or do you do it on a regular interval (every 100k documents for example) and 
then one at the end to be sure?

How are you observing that the last few didn't make it in? Are you looking at a 
slave or master?

-Todd
-Original Message-
From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr commits before documents are added

Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();  // Commit to Solr

 

 

Thanks,

Sharmila




Shards param accepts spaces between commas?

2009-10-19 Thread Jason Rutherglen
It seems like no, and should be an easy change.  I'm putting newlines
after the commas so the large shards list doesn't scroll off the
screen.


Filter query optimization

2009-10-19 Thread Jason Rutherglen
If a filter query matches nothing, then no additional query should be
performed and no results returned?  I don't think we have this today?


Wordnet dictionary integration with Solr - help

2009-10-19 Thread Nilya

I have been trying to integrate wordnet dictionary with solr. I used below
link to generate indexes using prolog package from wordnet. 
http://chencer.com/techno/java/lucene/wordnet.html

And here are the changes I did in solr : 

Schema.xml changes:   
 
  word 
  dict 
  solr.IndexBasedSpellChecker 
  word 
  UTF-8 
  ./syn_index 
  ./spellchekerFile1 
 

 

But with above changes wordnet dictionary doesn't seems to be working. 

1. Does anybody know whats wrong in my configuration. Any other change
required on the solrconfig? 
2. Is there any ohter way to import wordnet data in solr and use ? 
3. If there is another way to import wordnet as a simple text then I can as
well use it in my existing (default) sysname dictinary. 

Appreciate your help on answering this. 

Thanks. 

-- 
View this message in context: 
http://www.nabble.com/Wordnet-dictionary-integration-with-Solr---help-tp25963682p25963682.html
Sent from the Solr - User mailing list archive at Nabble.com.



ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Aaron McKee
I was wondering if anyone might have any insight on the following 
problem. I'm using the latest Solr code from SVN and indexing around 17m 
XML records via DIH. With perfect replicability, the following exception 
is thrown on the same aggregate file (#236, and each XML file has ~50k 
records), although not necessarily the same exact record. Oddly, it 
doesn't appear to be due to anything in the file - if I change the 
ordering or just index the file alone, it works fine.


java.lang.ArrayIndexOutOfBoundsException: -65536
   at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479)
   at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502)
   at 
org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130)
   at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467)
   at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
   at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at 
org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75)
   at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
   at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)


The related Lucene code is a bit thick and I'm having a hard time 
figuring out what could be going on here. I've added a bit of debug 
output to some of the intermediary classes and it looks like the 
exception is generally being thrown while processing one of my dynamic 
fields (type=tdouble, indexed=t, stored=f). The GeoUpdateProcessor code 
referenced above is my own, but essentially is the same as the LocalSolr 
update processor; it just contains a few lines of code that calculates a 
double value from two document fields and then stores that value in one 
of these dynamic fields. It hasn't caused any previous problems, only 
interacts with the underlying framework via cmd.geSolrInputDocument(), 
doc.getFieldValue(string), doc.addField(string, double), and 
next.processAdd(cmd),  and I've generated a number of indexes with it in 
the past, so I don't -think- that's a likely culprit. I've tried a run 
without the update processor and the problem seemed to go away (it made 
it past the above file, at least), but then this changes so many other 
factors that I don't know how much that really tells me (reduces field 
count by ~13 fields, eliminates all dynamic fields, etc.).


The only other thing worth mentioning is that I've replaced the Solr 
trunk Lucene jars with my own compiled versions, based off 2.9.0. The 
only thing different versus the 'stable' release is that it includes a 
few additional libraries (no core or contrib classes were modified). I 
haven't heard of any check-ins between 2.9.0 and 2.9.1-dev that should 
affect this...


Has anyone else run into a problem like this before?

Thanks,
Aaron



Re: Filter query optimization

2009-10-19 Thread Yonik Seeley
On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen
 wrote:
> If a filter query matches nothing, then no additional query should be
> performed and no results returned?  I don't think we have this today?

No, but this is a fast operation anyway (In Solr 1.4 at least).

Another thing to watch out for is to not try this with filters that
you don't know the size of (or else you may force a popcount on a
BitDocSet that would not otherwise have been needed).

It could also potentially complicate warming queries - need to be
careful that the combination of filters you are warming with matches
something, or it would cause the fieldCache entries to not be
populated.

-Yonik
http://www.lucidimagination.com


Re: ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Yonik Seeley
Thanks for the report Aaron, this definitely looks like a Lucene bug,
and I've opened
https://issues.apache.org/jira/browse/LUCENE-1995
Can you follow up there (I asked about your index settings).

-Yonik
http://www.lucidimagination.com



On Mon, Oct 19, 2009 at 3:04 PM, Aaron McKee  wrote:
> I was wondering if anyone might have any insight on the following problem.
> I'm using the latest Solr code from SVN and indexing around 17m XML records
> via DIH. With perfect replicability, the following exception is thrown on
> the same aggregate file (#236, and each XML file has ~50k records), although
> not necessarily the same exact record. Oddly, it doesn't appear to be due to
> anything in the file - if I change the ordering or just index the file
> alone, it works fine.
>
> java.lang.ArrayIndexOutOfBoundsException: -65536
>       at
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479)
>       at
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502)
>       at
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130)
>       at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467)
>       at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
>       at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
>       at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
>       at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>       at
> org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75)
>       at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>       at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>       at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>       at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>       at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
> The related Lucene code is a bit thick and I'm having a hard time figuring
> out what could be going on here. I've added a bit of debug output to some of
> the intermediary classes and it looks like the exception is generally being
> thrown while processing one of my dynamic fields (type=tdouble, indexed=t,
> stored=f). The GeoUpdateProcessor code referenced above is my own, but
> essentially is the same as the LocalSolr update processor; it just contains
> a few lines of code that calculates a double value from two document fields
> and then stores that value in one of these dynamic fields. It hasn't caused
> any previous problems, only interacts with the underlying framework via
> cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string,
> double), and next.processAdd(cmd),  and I've generated a number of indexes
> with it in the past, so I don't -think- that's a likely culprit. I've tried
> a run without the update processor and the problem seemed to go away (it
> made it past the above file, at least), but then this changes so many other
> factors that I don't know how much that really tells me (reduces field count
> by ~13 fields, eliminates all dynamic fields, etc.).
>
> The only other thing worth mentioning is that I've replaced the Solr trunk
> Lucene jars with my own compiled versions, based off 2.9.0. The only thing
> different versus the 'stable' release is that it includes a few additional
> libraries (no core or contrib classes were modified). I haven't heard of any
> check-ins between 2.9.0 and 2.9.1-dev that should affect this...
>
> Has anyone else run into a problem like this before?
>
> Thanks,
> Aaron
>
>


Re: stats page slow in latest nightly

2009-10-19 Thread Chris Hostetter

: I won't have access to the code until monday, but i'm pretty sure this 
: should be a fairly trivial change (just un-set the estimator on the 
: CacheEntry objects)

done, see notes in SOLR-1292




-Hoss



RE: Solr commits before documents are added

2009-10-19 Thread SharmilaR

Solr version is 1.3
I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true"
waitSearcher="true") every 100k documents and then one at the end. 
I have a counter next to addDoc(SolrDocument) statement to keep track of
number of documents added. When I query Solr after commit,  the total number
of documents returned does not match the number of documents added. This
happens only when I index millions of documents and not when I index like
500 documents. In this case, I know its the last 20 documents which are not
committed because each document has a field 'RECORD_ID' which is assigned
sequential number(in java code). When I query Solr using Solr admin
interface, the documents with last 20 RECORD_ID are missing.(example the
last id is 999,980 instead of 1,000,000)

- Sharmila


Feak, Todd wrote:
> 
> A few questions to help the troubleshooting.
> 
> Solr version #?
> 
> Is there just 1 commit through Solrj for the millions of documents? 
> 
> Or do you do it on a regular interval (every 100k documents for example)
> and then one at the end to be sure?
> 
> How are you observing that the last few didn't make it in? Are you looking
> at a slave or master?
> 
> -Todd
> 
> 
-Original Message-
From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr commits before documents are added

Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();  // Commit to Solr

 

 

Thanks,

Sharmila





-- 
View this message in context: 
http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter query optimization

2009-10-19 Thread Jason Rutherglen
Yonik,

> this is a fast operation anyway

Can you elaborate on why this is a fast operation?

Basically there's a distributed query with a filter, where on a
number of the servers, the filter query isn't matching anything,
however I'm seeing load on those servers (where nothing
matches), so I'm assuming the filter is generated (and cached)
which is fine, then the user query is being performed on a
filter where no documents match. I could misinterpreting the
data, however, I want to find out about this use case regardless
as it likely will crop up again for us.

-J

On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley
 wrote:
> On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen
>  wrote:
>> If a filter query matches nothing, then no additional query should be
>> performed and no results returned?  I don't think we have this today?
>
> No, but this is a fast operation anyway (In Solr 1.4 at least).
>
> Another thing to watch out for is to not try this with filters that
> you don't know the size of (or else you may force a popcount on a
> BitDocSet that would not otherwise have been needed).
>
> It could also potentially complicate warming queries - need to be
> careful that the combination of filters you are warming with matches
> something, or it would cause the fieldCache entries to not be
> populated.
>
> -Yonik
> http://www.lucidimagination.com
>


Version 0.9.3 of the PECL extension for solr has just been released

2009-10-19 Thread Israel Ekpo
Version 0.9.3 of the PECL extension for solr has just been released.

Some of the methods have been updated and more get* methods have been added
to the Query builder classes.

The user level documentation was also updated to make the installation
instructions a lot clearer.

The latest documentation and source code are available from the project home
page

http://pecl.php.net/package/solr

-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Filter query optimization

2009-10-19 Thread Yonik Seeley
On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen
 wrote:
> Yonik,
>
>> this is a fast operation anyway
>
> Can you elaborate on why this is a fast operation?

The scorers will never really be used.
The query will be weighted and scorers will be created, but the filter
will be checked first and return NO_MORE_DOCS.

-Yonik
http://www.lucidimagination.com

> Basically there's a distributed query with a filter, where on a
> number of the servers, the filter query isn't matching anything,
> however I'm seeing load on those servers (where nothing
> matches), so I'm assuming the filter is generated (and cached)
> which is fine, then the user query is being performed on a
> filter where no documents match. I could misinterpreting the
> data, however, I want to find out about this use case regardless
> as it likely will crop up again for us.
>
> -J
>
> On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley
>  wrote:
>> On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen
>>  wrote:
>>> If a filter query matches nothing, then no additional query should be
>>> performed and no results returned?  I don't think we have this today?
>>
>> No, but this is a fast operation anyway (In Solr 1.4 at least).
>>
>> Another thing to watch out for is to not try this with filters that
>> you don't know the size of (or else you may force a popcount on a
>> BitDocSet that would not otherwise have been needed).
>>
>> It could also potentially complicate warming queries - need to be
>> careful that the combination of filters you are warming with matches
>> something, or it would cause the fieldCache entries to not be
>> populated.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>


Core/shard preference

2009-10-19 Thread markwaddle

I have a small core performing deltas quickly (core00), and a large core
performing deltas slowly (core01), both on the same set of documents. The
delta core is cleaned nightly. As you can imagine, at times there are two
versions of a document, one in each core. When I execute a query that
matches this document, sometimes it will come from the delta core, and some
times it will come from the large core. It almost seems random. Here is my
query:

http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP

When the delta documents from core00 are returned as desired the access logs
show:

10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 293 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 1151 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 2597 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
HTTP/1.1 200 11881 9

When the documents are returned from core01 the access logs show:
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 289 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 3390 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
HTTP/1.1 200 11873 9

Any ideas on why there is a difference in the requests made? Is there a way
I can tell Solr to prefer the documents in core00?

Mark
-- 
View this message in context: 
http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Boost with wildcard.

2009-10-19 Thread AHMET ARSLAN
> The boost (index time) does not work
> when i am searching for a word with a wildcard appended to
> the end.
> I stumbled on to this "feature" and its pretty much a show
> stopper for me.
> I am implementing a live search feature where i always have
> an wildcard in the last word that is currently being written
> by the user.
> Will this be fixed anytime soon or does anyone have a
> workaround?
> 
> Example :
> "playstation*" gives an result with unboosted items but
> "playstation" gives the correct one.


Javadoc of SolrQueryParser says:

 
 * This class also deviates from the Lucene QueryParser by using 
 * ConstantScore versions of RangeQuery and PrefixQuery to prevent 
 * TooManyClauses exceptions.
 *  

If you want to disable this behavior you can modify 
protected Query getPrefixQuery(String field, String termStr) 
method of SolrQueryParser. But for this to work you also need to write a class 
that extends QParserPlugin and uses you new SolrQueryParser. You need to define 
your new QParserPlugin in solrconfig.xml as described here [1]

[1] http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

But prefix queries can easily cause TooManyBooleanClauses if the prefix is 
short like a*. Since it is going to OR all terms starting with a. Performance 
can be killer in such cases. I think thats why Solr uses ConstantScore 
versions. If you want you can increase this number (default is 1024) in 
solrconfig.xml 1024

I do not know how to solve your problem without writing custom code.

Hope this helps.


  


Re: Core/shard preference

2009-10-19 Thread Lance Norskog
Distributed Search is designed only for disjoint cores.

The document list from each core is returned sorted by the relevance
score. The distributed searcher merges these sorted lists. Solr does
not implement "distributed IDF", which essentially means distributed
coordinated scoring. All scoring happens inside each core, relative to
that core's contents. The resulting score numbers are not coordinated
with each other, and you will get random results.

There is no way to say "use this core's results" because the searches
are not compared all at once. Only the page of results fetched is
compared, so there's no way to suppress a result in the second page if
it was already found in the first.

On Mon, Oct 19, 2009 at 3:30 PM, markwaddle  wrote:
>
> I have a small core performing deltas quickly (core00), and a large core
> performing deltas slowly (core01), both on the same set of documents. The
> delta core is cleaned nightly. As you can imagine, at times there are two
> versions of a document, one in each core. When I execute a query that
> matches this document, sometimes it will come from the delta core, and some
> times it will come from the large core. It almost seems random. Here is my
> query:
>
> http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>
> When the delta documents from core00 are returned as desired the access logs
> show:
>
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
> HTTP/1.1 200 293 1
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
> HTTP/1.1 200 506 1
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
> HTTP/1.1 200 1151 1
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
> HTTP/1.1 200 2597 1
> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
> HTTP/1.1 200 11881 9
>
> When the documents are returned from core01 the access logs show:
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
> HTTP/1.1 200 289 1
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
> HTTP/1.1 200 506 1
> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
> HTTP/1.1 200 3390 1
> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
> HTTP/1.1 200 11873 9
>
> Any ideas on why there is a difference in the requests made? Is there a way
> I can tell Solr to prefer the documents in core00?
>
> Mark
> --
> View this message in context: 
> http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr commits before documents are added

2009-10-19 Thread Lance Norskog
commit(waitFlush="true", waitSearcher="true")  waits for the entire
operation and when it finishes, all 1 million documents should be
searchable.

Please try this same test with Solr 1.4 and post your results. To make
it easier, here is the first release candidate:

http://people.apache.org/~gsingers/solr/1.4.0-RC/


On Mon, Oct 19, 2009 at 1:06 PM, SharmilaR
 wrote:
>
> Solr version is 1.3
> I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true"
> waitSearcher="true") every 100k documents and then one at the end.
> I have a counter next to addDoc(SolrDocument) statement to keep track of
> number of documents added. When I query Solr after commit,  the total number
> of documents returned does not match the number of documents added. This
> happens only when I index millions of documents and not when I index like
> 500 documents. In this case, I know its the last 20 documents which are not
> committed because each document has a field 'RECORD_ID' which is assigned
> sequential number(in java code). When I query Solr using Solr admin
> interface, the documents with last 20 RECORD_ID are missing.(example the
> last id is 999,980 instead of 1,000,000)
>
> - Sharmila
>
>
> Feak, Todd wrote:
>>
>> A few questions to help the troubleshooting.
>>
>> Solr version #?
>>
>> Is there just 1 commit through Solrj for the millions of documents?
>>
>> Or do you do it on a regular interval (every 100k documents for example)
>> and then one at the end to be sure?
>>
>> How are you observing that the last few didn't make it in? Are you looking
>> at a slave or master?
>>
>> -Todd
>>
>>
> -Original Message-
> From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu]
> Sent: Monday, October 19, 2009 9:19 AM
> To: solr-user@lucene.apache.org
> Subject: Solr commits before documents are added
>
> Hi,
>
> My application indexes huge number of documents(like in millions). Below
> is the snapshot of my code where I add all documents to Solr, and then
> at last issue commit command. I use Solrj. I find that last few
> documents are not  committed to Solr. Is this because adding documents
> to Solr took longer time and it reached commit command even before it
> finished adding documents? Is there are way to ensure that solr waits
> for all documents to be added and then commits? Please advise me how to
> solve this issue.
>
>
>
> For loop
>
>                solrServer.add(doc);   // Add document to Solr
>
> End for loop
>
> solrServer.commit();          // Commit to Solr
>
>
>
>
>
> Thanks,
>
> Sharmila
>
>
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Index Corruption (possibly during commit)

2009-10-19 Thread Simon Wistow
We have an indexing script which has been running for a couple of weeks 
now without problems. It indexes documents and then periodically commit 
(which is a tad redundant I suppose) both via the HTTP interface.

All documents are indexed to a master and a slave rsyncs them off using 
the standard 1.3.0 replication.

Recently the indexing script got into problems when the commit was 
taking longer than the request timeout. I killed the script, did a 
commit by hand (using 
bin/commit) and then started to index again and it still wouldn't 
commit. We then tried to go to the stats page and got the error

org.apache.lucene.index.CorruptIndexException:
doc counts differ for segment _mib: fieldsReader shows 1 but segmentInfo 
shows 718 at 
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at 
org.apache.solr.core.SolrCore.(SolrCore.java:470) at 

This is a stock 1.3.0 running off tomcat 6.0.20 with 

java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

Linux solr.local 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 
2009 x86_64 x86_64 x86_64 GNU/Linux

Plenty of RAM and disk space (usage is 31% - 353G used from 534G)

CheckIndex says

Opening index @ index/

Segments file=segments_c8z numSegments=28 version=FORMAT_HAS_PROX 
[Lucene 2.4]

Checking only these segments: _mib:
  22 of 28: name=_mib docCount=718
compound=false
hasProx=true
numFiles=9
size (MB)=0.029
has deletions [delFileName=_mib_1.del]
test: open reader.FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:
org.apache.lucene.index.CorruptIndexException: doc counts differ for 
segment _mib: fieldsReader shows 1 but segmentInfo shows 718
at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:282)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:591)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:491)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

WARNING: 1 broken segments (containing 718 documents) detected
WARNING: would write new segments file, and 718 documents would be lost, 
if -fix were specified


 
Any ideas? We can restore from back ups and back fill but really we'd 
love to know what caused this so we can avoid a repetition.

Simon




Re: Filter query optimization

2009-10-19 Thread Jason Rutherglen
Ok, thanks, new Lucene 2.9 features.

On Mon, Oct 19, 2009 at 2:33 PM, Yonik Seeley
 wrote:
> On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen
>  wrote:
>> Yonik,
>>
>>> this is a fast operation anyway
>>
>> Can you elaborate on why this is a fast operation?
>
> The scorers will never really be used.
> The query will be weighted and scorers will be created, but the filter
> will be checked first and return NO_MORE_DOCS.
>
> -Yonik
> http://www.lucidimagination.com
>
>> Basically there's a distributed query with a filter, where on a
>> number of the servers, the filter query isn't matching anything,
>> however I'm seeing load on those servers (where nothing
>> matches), so I'm assuming the filter is generated (and cached)
>> which is fine, then the user query is being performed on a
>> filter where no documents match. I could misinterpreting the
>> data, however, I want to find out about this use case regardless
>> as it likely will crop up again for us.
>>
>> -J
>>
>> On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley
>>  wrote:
>>> On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen
>>>  wrote:
 If a filter query matches nothing, then no additional query should be
 performed and no results returned?  I don't think we have this today?
>>>
>>> No, but this is a fast operation anyway (In Solr 1.4 at least).
>>>
>>> Another thing to watch out for is to not try this with filters that
>>> you don't know the size of (or else you may force a popcount on a
>>> BitDocSet that would not otherwise have been needed).
>>>
>>> It could also potentially complicate warming queries - need to be
>>> careful that the combination of filters you are warming with matches
>>> something, or it would cause the fieldCache entries to not be
>>> populated.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>
>


Retrieve Matching Term

2009-10-19 Thread angry127

Hi,

Is it possible to get the matching terms from your query for each document
returned without using highlighting.

For example if you have the query "aaa bbb ccc" and one of the documents has
the term "aaa" and another document has the term "bbb" and "ccc". 

To have Solr return:

Document 1: "aaa"
Document 2: "bbb ccc"

I was told this is possible using Term Vectors. I have not been able to find
a way to do this using Term Vectors. The only reason I am against using
highlighting is for performance reasons.

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr commits before documents are added

2009-10-19 Thread Yonik Seeley
On Mon, Oct 19, 2009 at 7:39 PM, Lance Norskog  wrote:
> commit(waitFlush="true", waitSearcher="true")  waits for the entire
> operation and when it finishes, all 1 million documents should be
> searchable.

That waits for the commit to complete, but not any adds that may be
happening in parallel (that's pretty much impossible).

If the client uses multiple threads to do adds, it's currently the
responsibility of the client to wait for all of the pending adds to
complete before calling commit.

-Yonik
http://www.lucidimagination.com


> Please try this same test with Solr 1.4 and post your results. To make
> it easier, here is the first release candidate:
>
> http://people.apache.org/~gsingers/solr/1.4.0-RC/
>
>
> On Mon, Oct 19, 2009 at 1:06 PM, SharmilaR
>  wrote:
>>
>> Solr version is 1.3
>> I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true"
>> waitSearcher="true") every 100k documents and then one at the end.
>> I have a counter next to addDoc(SolrDocument) statement to keep track of
>> number of documents added. When I query Solr after commit,  the total number
>> of documents returned does not match the number of documents added. This
>> happens only when I index millions of documents and not when I index like
>> 500 documents. In this case, I know its the last 20 documents which are not
>> committed because each document has a field 'RECORD_ID' which is assigned
>> sequential number(in java code). When I query Solr using Solr admin
>> interface, the documents with last 20 RECORD_ID are missing.(example the
>> last id is 999,980 instead of 1,000,000)
>>
>> - Sharmila
>>
>>
>> Feak, Todd wrote:
>>>
>>> A few questions to help the troubleshooting.
>>>
>>> Solr version #?
>>>
>>> Is there just 1 commit through Solrj for the millions of documents?
>>>
>>> Or do you do it on a regular interval (every 100k documents for example)
>>> and then one at the end to be sure?
>>>
>>> How are you observing that the last few didn't make it in? Are you looking
>>> at a slave or master?
>>>
>>> -Todd
>>>
>>>
>> -Original Message-
>> From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu]
>> Sent: Monday, October 19, 2009 9:19 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr commits before documents are added
>>
>> Hi,
>>
>> My application indexes huge number of documents(like in millions). Below
>> is the snapshot of my code where I add all documents to Solr, and then
>> at last issue commit command. I use Solrj. I find that last few
>> documents are not  committed to Solr. Is this because adding documents
>> to Solr took longer time and it reached commit command even before it
>> finished adding documents? Is there are way to ensure that solr waits
>> for all documents to be added and then commits? Please advise me how to
>> solve this issue.
>>
>>
>>
>> For loop
>>
>>                solrServer.add(doc);   // Add document to Solr
>>
>> End for loop
>>
>> solrServer.commit();          // Commit to Solr
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Sharmila
>>
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Core/shard preference

2009-10-19 Thread Yonik Seeley
Although shards should be disjoint, Solr "tolerates" duplication
(won't return duplicates in the main results list, but doesn't make
any effort to correct facet counts, etc).

Currently, whichever shard responds first wins.
The relevant code is around line 420 in QueryComponent.java:

  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;

// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) >= 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }

So it's certainly possible to make it deterministic, we just haven't
done it yet.

-Yonik
http://www.lucidimagination.com


On Mon, Oct 19, 2009 at 7:30 PM, Lance Norskog  wrote:
> Distributed Search is designed only for disjoint cores.
>
> The document list from each core is returned sorted by the relevance
> score. The distributed searcher merges these sorted lists. Solr does
> not implement "distributed IDF", which essentially means distributed
> coordinated scoring. All scoring happens inside each core, relative to
> that core's contents. The resulting score numbers are not coordinated
> with each other, and you will get random results.
>
> There is no way to say "use this core's results" because the searches
> are not compared all at once. Only the page of results fetched is
> compared, so there's no way to suppress a result in the second page if
> it was already found in the first.
>
> On Mon, Oct 19, 2009 at 3:30 PM, markwaddle  wrote:
>>
>> I have a small core performing deltas quickly (core00), and a large core
>> performing deltas slowly (core01), both on the same set of documents. The
>> delta core is cleaned nightly. As you can imagine, at times there are two
>> versions of a document, one in each core. When I execute a query that
>> matches this document, sometimes it will come from the delta core, and some
>> times it will come from the large core. It almost seems random. Here is my
>> query:
>>
>> http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>
>> When the delta documents from core00 are returned as desired the access logs
>> show:
>>
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
>> HTTP/1.1 200 293 1
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
>> HTTP/1.1 200 506 1
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
>> HTTP/1.1 200 1151 1
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
>> HTTP/1.1 200 2597 1
>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>> HTTP/1.1 200 11881 9
>>
>> When the documents are returned from core01 the access logs show:
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
>> HTTP/1.1 200 289 1
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
>> HTTP/1.1 200 506 1
>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
>> HTTP/1.1 200 3390 1
>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>> HTTP/1.1 200 11873 9
>>
>> Any ideas on why there is a difference in the requests made? Is there a way
>> I can tell Solr to prefer the documents in core00?
>>
>> Mark
>> --
>> View this message in context: 
>> http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Retrieve Matching Term

2009-10-19 Thread Avlesh Singh
If you query looks like this -
q=(myField:aaa myField:bbb myField:ccc)
you would get the desired results for any tokenized field (e.g. text) called
myField.

Cheers
Avlesh

On Tue, Oct 20, 2009 at 6:28 AM, angry127  wrote:

>
> Hi,
>
> Is it possible to get the matching terms from your query for each document
> returned without using highlighting.
>
> For example if you have the query "aaa bbb ccc" and one of the documents
> has
> the term "aaa" and another document has the term "bbb" and "ccc".
>
> To have Solr return:
>
> Document 1: "aaa"
> Document 2: "bbb ccc"
>
> I was told this is possible using Term Vectors. I have not been able to
> find
> a way to do this using Term Vectors. The only reason I am against using
> highlighting is for performance reasons.
>
> Thanks.
> --
> View this message in context:
> http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Is Relational Mapping (foreign key) possible in solr ??

2009-10-19 Thread ashokcz

Hi Jerome ,
thanks for your response.
I never knew about multivalued fields.
Will give a try about it and see if that suits my need.


But i dont understand this 
* You could have a mixed index with user and project entries in the
same index, so if you search for a name, you'd find users and projects
matching that name.

Could you please tell me in detail as how i can do that.


Jérôme Etévé wrote:
> 
> Hi,
> 
>  here's what you could do:
> 
> * Use multivalued fields instead of 'comma separated values', so you
> won't need a separator.
> * Store project identifiers in the user index.
> 
> Denormalised projects informations in a user entry will fatally need
> re-indexing lot of user entries when project info changes.
> 
> * You could have a mixed index with user and project entries in the
> same index, so if you search for a name, you'd find users and projects
> matching that name.
> 
> Jerome.
> 
> 2009/10/19 ashokcz :
>>
>> Hi i browsed through the solr docs and user forums and what i infer is we
>> cant use solr to store Relational
>> Mapping(foreign key) in solr .
>>
>> but just want to know if any chances of doing the same.
>>
>> I have two tables User table (with 1,00,000 entries )  and project table
>> with (200 entries ).
>> User table columns : userid , name ,country , location , etc.
>> Project tables Columns : project name , description , business unit ,
>> project type .
>> Here User Location , Country , Project  Name , Project  business unit ,
>> project type are faceted
>> A user can be mapped to multiple projects.
>> In solr i store the details like this 
>> [
>> {
>> userId:1234;
>> userName:ABC;
>> Country:US;
>> Location:NY;
>> Project Name:Project1,Project2;
>> Project Description:Project1,Project2;
>> Project  business unit:unit1,unit2;
>> Project type:Type1,Type2
>> }
>> ]
>>
>> With this structure i could get faceted details about both user data and
>> project data .
>>
>> But here i face 2 Problems .
>>
>> 1.A project can be mapped to many users say 10,000 Users . So if i change
>> a
>> project name then i end
>> up indexing 10,000 Records which is a very time consuming work.
>>
>> 2.for Fields like Project Description i could not find any proper
>> delimiter
>> . for other fields comma (,) is
>>
>> okay but being description i could not use any specific delimiter .This
>> is
>> not faceted but still in search results i need to take this out and show
>> the
>> project details in tabular format. and i use delimiter to split it .For
>> other  project fields like Project Name and Type i could do it but not
>> for
>> this Project Description field
>>
>> So i expect is there any way of storing the data as relational records
>> like
>> in user details where we will have field called project Id and data will
>> be
>> 1,2 which refers to project records primary key in solr and still
>> preserve
>> the faceted approach.
>>
>> As for my knowledge my guess is it cant be done ???
>> Am i correct ???
>> If so then how we can achieve the solutions to my problem??
>> Pls if someone could share some ideas it will be useful.
>> --
>> View this message in context:
>> http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Jerome Eteve.
> http://www.eteve.net
> jer...@eteve.net
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25969540.html
Sent from the Solr - User mailing list archive at Nabble.com.