Fwd: Add a plugin class to solr

2010-07-28 Thread Sanal K Stephen
Hi all,
 I want to add a plugin class to solr which can filter the results
based on certain criteria.I have an array which has the solr document unique
key as the index. and the value which will be one or zero.if it is zero I
want to filter it from the result set.This filtering should happen
before faceting because After this filtering the number of results may
reduce to 20% to 30%.So I want to get faceting counts after the filtering of
results otherswise what ever facet counts return will not be usefull for me.
   So any idea of how I can make use of the solr plugin feature or which
all classes in the solr code i need to look in to do this please help.

Thanks
Sanal K Stephen


Re: SolrJ Response + JSON

2010-07-28 Thread Ranveer
Rajani is right you can get response by passing wt=json. But I think if 
you want to use solrj then
you will require to parse binary format data in json format or you can 
use third party json parser.


regards
Ranveer
http://www.onlymyhealth.com

On Thursday 29 July 2010 09:55 AM, rajini maski wrote:

Yeah right... This query will do it

http://localhost:8090/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on&wt=json

This will do your work... This is more liike using xsl transformation
supported by solr..:)

Regards,
Rajani Maski


On Wed, Jul 28, 2010 at 6:24 PM, Mark Allan  wrote:

   

I think you should just be able to add&wt=json to the end of your query
(or change whatever the existing wt parameter is in your URL).

Mark


On 28 Jul 2010, at 12:54 pm, MitchK wrote:


 

Hello community,

I need to transform SolrJ - responses into JSON, after some computing on
those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.


   

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


 
   




Re: logic required for newbie

2010-07-28 Thread Jonty Rhods
Again thanks for reply..

Actually I am getting result. But I am getting all column of the rows. I
want to remove unnecessary column.
In case of q=piza hut.. then I want to get only piza
hut.
Same if search query change to "ford motor" then want only ford
motor.
more example if query is "piza hut ford motor" then expected result should
be..

1
 some name
 user_id
 new york
 USA
 piza hut
 ford motor

In above expected result..  5th avenue,
 ms departmental store,
 base bakery ,

has been removed because it not carrying any matched text..

More generalized form I want to filter all unmatched column which not
carrying matched query.
Right now I am getting proper result but getting full column set. My
requirement is only match landmark should return..
So I want to filter the column which carry text (match to query).

hoping someone will help me to clear my concept..

regards

On Thu, Jul 29, 2010 at 9:41 AM, rajini maski  wrote:

> First of all I hope that in schema you have mentioned for fields
> indexed=true and stored=true...
> Next if you have done so... and now just search as q=landmark:piza... you
> will get one result set only..
>
> Note : There is one constraint about applying analyzers and tokenizers...
> IF
> you apply white space tokenizer...that is , data type=text_ws. The only
> you will get result set of "piza hut" even when you query for piza... If no
> tokenizer applied..You  will not get it...
> I hope this was needed reply..If something elseyou can easy
> question..;)
>
>
> On Wed, Jul 28, 2010 at 8:42 PM, Jonty Rhods 
> wrote:
>
> > Hi
> >
> > thanks for reply..
> >  Actually requirement is diffrent (sorry if I am unable to clerify in
> first
> > mail).
> >
> > basically follwoing are the fields name in schema as well:
> > > 1. id
> > > 2. name
> > > 3. user_id
> > > 4. location
> > > 5. country
> > > 6. landmark1
> > > 7. landmark2
> > > 8. landmark3
> > > 9. landmark4
> > > 10. landmark5
> >
> > which carrying text...
> > for example:
> >
> > 1
> > some name
> > user_id
> > new york
> > USA
> > 5th avenue
> > ms departmental store
> > base bakery
> > piza hut
> > ford motor
> >
> > now if user search by "piza" then expected result like:
> >
> > 1
> > some name
> > user_id
> > new york
> > USA
> > piza hut
> >
> > it means I want to ignore all other landmark which not match. By filter
> we
> > can filter the fields but here I dont know the
> > the field name because it depends on text match.
> >
> > is there any other solution.. I am ready to change in schema or in logic.
> I
> > am using solrj.
> >
> > please help me I stuck here..
> >
> > with regards
> >
> >
> > On Wed, Jul 28, 2010 at 7:22 PM, rajini maski 
> > wrote:
> >
> > > you can index each of these field separately...
> > > field1-> Id
> > > field2-> name
> > > field3->user_id
> > > field4->country.
> > >
> > > 
> > > field7-> landmark
> > >
> > > While quering  you can specify  "q=Landmark9" This will return you
> > > results..
> > > And if you want only particular fields in output.. use the "fl"
> parameter
> > > in
> > > query...
> > >
> > > like
> > >
> > > http://localhost:8090/solr/select?
> > > indent=on&q=landmark9&fl=ID,user_id,country,landmark&
> > >
> > > This will give your desired solution..
> > >
> > >
> > >
> > >
> > > On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am very new and learning solr.
> > > >
> > > > I have 10 column like following in table
> > > >
> > > > 1. id
> > > > 2. name
> > > > 3. user_id
> > > > 4. location
> > > > 5. country
> > > > 6. landmark1
> > > > 7. landmark2
> > > > 8. landmark3
> > > > 9. landmark4
> > > > 10. landmark5
> > > >
> > > > when user search for landmark then  I want to return only one
> landmark
> > > > which
> > > > match. Rest of the landmark should ingnored..
> > > > expected result like following if user search by "landmark2"..
> > > >
> > > > 1. id
> > > > 2. name
> > > > 3. user_id
> > > > 4. location
> > > > 5. country
> > > > 7. landmark2
> > > >
> > > > or if search by "landmark9"
> > > >
> > > > 1. id
> > > > 2. name
> > > > 3. user_id
> > > > 4. location
> > > > 5. country
> > > > 9. landmark9
> > > >
> > > >
> > > > please help me to design the schema for this kind of requirement...
> > > >
> > > > thanks
> > > > with regards
> > > >
> > >
> >
>


Re: SolrJ Response + JSON

2010-07-28 Thread rajini maski
Yeah right... This query will do it

http://localhost:8090/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on&wt=json

This will do your work... This is more liike using xsl transformation
supported by solr..:)

Regards,
Rajani Maski


On Wed, Jul 28, 2010 at 6:24 PM, Mark Allan  wrote:

> I think you should just be able to add &wt=json to the end of your query
> (or change whatever the existing wt parameter is in your URL).
>
> Mark
>
>
> On 28 Jul 2010, at 12:54 pm, MitchK wrote:
>
>
>> Hello community,
>>
>> I need to transform SolrJ - responses into JSON, after some computing on
>> those results by another application has finished.
>>
>> I can not do those computations on the Solr - side.
>>
>> So, I really have to translate SolrJ's output into JSON.
>>
>> Any experiences how to do so without writing your own JSON-writer?
>>
>> Thank you.
>> - Mitch
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


Re: logic required for newbie

2010-07-28 Thread rajini maski
First of all I hope that in schema you have mentioned for fields
indexed=true and stored=true...
Next if you have done so... and now just search as q=landmark:piza... you
will get one result set only..

Note : There is one constraint about applying analyzers and tokenizers... IF
you apply white space tokenizer...that is , data type=text_ws. The only
you will get result set of "piza hut" even when you query for piza... If no
tokenizer applied..You  will not get it...
I hope this was needed reply..If something elseyou can easy question..;)


On Wed, Jul 28, 2010 at 8:42 PM, Jonty Rhods  wrote:

> Hi
>
> thanks for reply..
>  Actually requirement is diffrent (sorry if I am unable to clerify in first
> mail).
>
> basically follwoing are the fields name in schema as well:
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 6. landmark1
> > 7. landmark2
> > 8. landmark3
> > 9. landmark4
> > 10. landmark5
>
> which carrying text...
> for example:
>
> 1
> some name
> user_id
> new york
> USA
> 5th avenue
> ms departmental store
> base bakery
> piza hut
> ford motor
>
> now if user search by "piza" then expected result like:
>
> 1
> some name
> user_id
> new york
> USA
> piza hut
>
> it means I want to ignore all other landmark which not match. By filter we
> can filter the fields but here I dont know the
> the field name because it depends on text match.
>
> is there any other solution.. I am ready to change in schema or in logic. I
> am using solrj.
>
> please help me I stuck here..
>
> with regards
>
>
> On Wed, Jul 28, 2010 at 7:22 PM, rajini maski 
> wrote:
>
> > you can index each of these field separately...
> > field1-> Id
> > field2-> name
> > field3->user_id
> > field4->country.
> >
> > 
> > field7-> landmark
> >
> > While quering  you can specify  "q=Landmark9" This will return you
> > results..
> > And if you want only particular fields in output.. use the "fl" parameter
> > in
> > query...
> >
> > like
> >
> > http://localhost:8090/solr/select?
> > indent=on&q=landmark9&fl=ID,user_id,country,landmark&
> >
> > This will give your desired solution..
> >
> >
> >
> >
> > On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods 
> > wrote:
> >
> > > Hi All,
> > >
> > > I am very new and learning solr.
> > >
> > > I have 10 column like following in table
> > >
> > > 1. id
> > > 2. name
> > > 3. user_id
> > > 4. location
> > > 5. country
> > > 6. landmark1
> > > 7. landmark2
> > > 8. landmark3
> > > 9. landmark4
> > > 10. landmark5
> > >
> > > when user search for landmark then  I want to return only one landmark
> > > which
> > > match. Rest of the landmark should ingnored..
> > > expected result like following if user search by "landmark2"..
> > >
> > > 1. id
> > > 2. name
> > > 3. user_id
> > > 4. location
> > > 5. country
> > > 7. landmark2
> > >
> > > or if search by "landmark9"
> > >
> > > 1. id
> > > 2. name
> > > 3. user_id
> > > 4. location
> > > 5. country
> > > 9. landmark9
> > >
> > >
> > > please help me to design the schema for this kind of requirement...
> > >
> > > thanks
> > > with regards
> > >
> >
>


Is solr able to merge index on different nodes

2010-07-28 Thread Chengyang
Once I want to create a large index, can I split the index on different nodes 
and the merge all the indexs to one node.
Any further suggestion for this case?


Help with schema design

2010-07-28 Thread Pramod Goyal
Hi,
I have a use case where i get a document and a list of events that has
happened on the document. For example

First document:
 Some text content
Events:
  Event TypeEvent By Event Time
  Update  Pramod  06062010 2:30:00
  Update  Raj 06062010 2:30:00
  View Rahul  07062010 1:30:00


I would like to support queries like get all document Event Type = ? and
Event time greater than ? ,  also query like get all the documents Updated
by Pramod.
How should i design my schema to support this use case.

Thanks,
Regards,
Pramod Goyal


Re: Scoring Search for autocomplete

2010-07-28 Thread Chris Hostetter

You weren't really clear on how you are generating your autocomplete 
results -- ie: via TermsComponent on your "main" index? or via a 
search on a custom index where each document is a "word" to suggested?

Assuming the later, then the approach you describe below sounds good to 
me, but it doesn't seem like it would really make sense for hte former.


: Hi, I have an autocomplete that is currently working with an
: NGramTokenizer so if I search for "Yo" both "New York" and "Toyota"
: are valid results.  However I'm trying to figure out how to best
: implement the search so that from a score perspective if the string
: matches the beginning of an entire field it ranks first, followed by
: the beginning of a term and then in the middle of a term.  For example
: if I was searching with "vi" I would want Virginia ahead of West
: Virginia ahead of Five.
: 
: I think I can do this with three seperate fields, one using a white
: space tokenizer and a ngram filter, another using the edge-ngram +
: whitespace and another using keyword+edge-ngram, then doing an or on
: the 3 fields, so that Virginia would match all 3 and get a higher
: score... but this doesn't feel right to me, so I wanted to check for
: better options.
: 
: Thanks.
: 



-Hoss



Re: WordDelimiterFilter and phrase queries?

2010-07-28 Thread Chris Hostetter
: pos token offset
: 1 3 0-1
: 2 diphenyl 2-10
: 3 propanoic 11-20
: 3 diphenylpropanoic 2-20

: Say someone enters the query string 3-diphenylpropanoic
: 
: The query parser I'm using transforms this into a phrase query and the
: indexed form is missed because based the positions of the terms '3'
: and 'diphenylpropanoic' indicate they are not adjacent?
: 
: Is this intended behavior? I expect that the catenated word
: 'diphenylpropanoic' should have a position of 2 based on the position
: of the first term in the concatenation, but perhaps I'm missing

I believe this is correct, but i'm not certain for hte reason - i think 
it's just an implementation detail.  Consider the ooposite scenerio: if 
your indexed text was diphenyl-propanoic-3 and things worked the way 
you are suggesting they should, the term diphenylpropanoic 
would up at position 1 (with diphenyl) and "diphenylpropanoic-3" would not 
match because then the terms wouldn't be adjacent.

damned if you do, damned if you don't

typically for fields whwere you are using WDF with the "concat" options 
you would usually use a bit of slop on the generated phrase queries to 
allow for the loosenes of the position information.  (in an ideal world, 
the token strem wouldn't have monotomic integer positions, it would be 
a DAG, and then these things would be easily represented, but that's 
pretty non-trivial to do with the internals.


-Hoss



Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread Erick Erickson
Your commits are very suspect. How often are you making changes to your
index?
Do you have autocommit on? Do you commit when updating each document?
Committing
too often and consequently firing off warmup queries is the first place I'd
look. But I
agree with dc tech, 1,500 is wy more than I would expect.

Best
Erick



On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou <
ch...@simpleweb.co.uk> wrote:

> Hi,
>
> Solr seems to be crashing after a JVM exception that new threads cannot be
> created. I am writing in hope of advice from someone that has experienced
> this before. The exception that is causing the problem is:
>
> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to
> create new native thread
>
> The memory that is allocated to Solr is 3072MB, which should be enough
> memory for a ~6GB data set. The documents are not big either, they have
> around 10 fields of which only one stores large text ranging between 1k-50k.
>
> The top command at the time of the crash shows Solr using around 1500
> threads, which I assume it is not normal. Could it be that the threads are
> crashing one by one and new ones are created to cope with the queries?
>
> In the log file, right after the the exception, there are several thousand
> commits before the server stalls completely. Normally, the log file would
> report 20-30 document existence queries per second, then 1 commit per 5-30
> seconds, and some more infrequent faceted document searches on the data.
> However after the exception, there are only commits until the end of the log
> file.
>
> I am wondering if anyone has experienced this before or if it is some sort
> of known bug from Solr 1.4? Is there a way to increase the details of the
> exception in the logfile?
>
> I am attaching the output of a grep Exception command on the logfile.
>
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException

Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread dc tech
1,500 threads seems extreme by any standards so there is something
happening in your install. Even with appservers for web apps,
typically 100 would be a fair # of threads.


On 7/28/10, Christos Constantinou  wrote:
> Hi,
>
> Solr seems to be crashing after a JVM exception that new threads cannot be
> created. I am writing in hope of advice from someone that has experienced
> this before. The exception that is causing the problem is:
>
> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to create
> new native thread
>
> The memory that is allocated to Solr is 3072MB, which should be enough
> memory for a ~6GB data set. The documents are not big either, they have
> around 10 fields of which only one stores large text ranging between 1k-50k.
>
> The top command at the time of the crash shows Solr using around 1500
> threads, which I assume it is not normal. Could it be that the threads are
> crashing one by one and new ones are created to cope with the queries?
>
> In the log file, right after the the exception, there are several thousand
> commits before the server stalls completely. Normally, the log file would
> report 20-30 document existence queries per second, then 1 commit per 5-30
> seconds, and some more infrequent faceted document searches on the data.
> However after the exception, there are only commits until the end of the log
> file.
>
> I am wondering if anyone has experienced this before or if it is some sort
> of known bug from Solr 1.4? Is there a way to increase the details of the
> exception in the logfile?
>
> I am attaching the output of a grep Exception command on the logfile.
>
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrEx

Re: simple question from a newbie

2010-07-28 Thread Erick Erickson
What is the query you submit (don't forget &debugQuery=on"? In particular,
what
field are you sorting on?

But yes, if you're searching on a tokenized field, you'll get matches on all
tokens
in that field. Which are probably single words. And no matter how you sort,
you're
still getting documents where the whole title doesn't start with "c" in your
title.

What happens if you search on your dc3.title instead? It uses the keyword
tokenizer
which tokenizes the entire title as a single token. Sort by that one too.

Best
Erick

On Wed, Jul 28, 2010 at 12:26 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR) <
v...@cdc.gov> wrote:

> I think I got it to work.  If I do a wildcard search using the dc3.title
> field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
> every title that has a word in it that starts with 'c', which isn't
> exactly what I wanted.  I'm guessing it's because of the
> type="caseInsensitiveSort".
>
> Well, here is my schema for reference.  Thanks for your help.
>
>
> - 
> - 
>omitNorms="true" />
> - 
>sortMissingLast="true" omitNorms="true" />
>   
>  
>  
>  
>sortMissingLast="true" omitNorms="true" />
>   sortMissingLast="true" omitNorms="true" />
>   sortMissingLast="true" omitNorms="true" />
>   sortMissingLast="true" omitNorms="true" />
>omitNorms="true" />
> -  positionIncrementGap="100">
> - 
>  
>  
>  
> -  positionIncrementGap="100">
> - 
>   
>   words="stopwords.txt" />
>generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" />
>  
>   protected="protwords.txt" />
>  
>   
> - 
>  
>   ignoreCase="true" expand="true" />
>   words="stopwords.txt" />
>generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" />
>  
>   protected="protwords.txt" />
>   
>  
>  
> -  positionIncrementGap="100">
> - 
>  
>   ignoreCase="true" expand="false" />
>   words="stopwords.txt" />
>   generateNumberParts="0" catenateWords="1" catenateNumbers="1"
> catenateAll="0" />
>  
>protected="protwords.txt" />
>   
>  
>  
> -  sortMissingLast="true" omitNorms="true">
> - 
>  
>  
>  
>   
>  
>   class="solr.StrField" />
>  
> - 
> - 
>  
>  
>  
>   />
>   />
>   stored="true" />
>   stored="true" />
>   multiValued="true" />
> - 
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" omitnorms="true" />
>   multiValued="true" />
>   stored="true" multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   stored="true" multiValued="true" />
>   multiValued="true" />
>   stored="true" multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>   stored="true" multiValued="true" />
>   stored="true" multiValued="true" />
>   stored="true" multiValued="true" />
>   default="NOW" multiValued="false" />
>  
>   
>
>   
>   
>   />
>   
>
>   />
>  
>multiValued="true" />
>   multiValued="true" />
>   multiValued="true" />
>  
>  PID
>  fgs.label
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>
> Vincent Vu Nguyen
> Division of Science Quality and Translation
> Office of the Associate Director for Science
> Centers for Disease Control and Prevention (CDC)
> 404-498-6154
> Century Bldg 2400
> Atlanta, GA 30329
>
>
> -Original Message-
> From: Ranveer [mailto:ranveer.s...@gmail.com]
> Sent: Wednesday, July 28, 2010 11:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: simple question from a newbie
>
> I think you using wild-card search or should use wild-card search. but
> first of all please provide the schema and configuration file for more
> details.
>
> regards
> Ranveer
>
>
> On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI)
> (CTR) wrote:
> > Hi,
> >
> >
> >
> > I'm new to Solr and have a rather dumb question.  I want to do a query
> > that returns all the Titles that start with a certain letter.  For
> > example
> >
> >
> >
> > I have these titles:
> >
> > Results of in-mine research in support
> >
> > Cancer Reports
> >
> > State injury indicators report
> >
> > Cancer Reports
> >
> > Indexed dermal bibliography
> >
> > Childhood agricultural-related injury report
> >
> > Childhood agricultural injury prevention
> >
> >
> >
> >
> >
> > I want the query to return:
> >
> > Cancer Reports
> >
> > Cancer Reports
> >
> > Childhood agricultural-related injury report
> >
> > Childhood agricultural injury prevention
> >
> >
> >
> > I want something like dc.title=c* type query
> >
> >
> >
> > I know that I can facet by dc.title and then use the para

Re: Show elevated Result Differently

2010-07-28 Thread Erick Erickson
Please expand on what this means, it's quite vague. You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, Jul 28, 2010 at 8:43 AM, Vishal.Arora  wrote:

>
> I want to show elevated Result Different from others is there any way to do
> this
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Know which terms are in a document

2010-07-28 Thread Max Lynch
I would like to be search against my index, and then *know* which of a set
of given terms were found in each document.

For example, let's say I want to show articles with the word "pizza" or
"cake" in them, but would like to be able to say which of those two was
found.  I might use this to handle the article differently if it is about
pizza, or if it is about cake.  I understand I can do multiple queries but I
would like to avoid that.

One thought I had was to use a highlighter and only return a fragment with
the highlighted word, but I'm not sure how to do this with the various
highlighting options.

Is there a way?

Thanks.


Re: Using Solr to perform range queries in Dspace

2010-07-28 Thread Chris Hostetter

: I'm trying to use dspace to search across a range of index created and stored
: using Dsindexer.java class. I have seen where Solr can be use to perform

I've never headr of Dsindexer.java but since this is hte first result 
google returns...

http://scm.dspace.org/trac/dspace/browser/trunk/dspace/src/org/dspace/search/DSIndexer.java?rev=970

...i'm going to assume that's what you are talking about.

: numerical range queries using either TrieIntField,
: TrieDoubleField,TrieLongField, etc.. classes defined in Solr's api or 
: SortableIntField.java, SortableLongField,SortableDoubleField.java. I would
: like to know how to implement these classes in Dspace so that I can be able
: to perform numerical range queries. Any help would be greatly apprciated.

i *think* what you are asking is how to use Solr to search the numeric 
fields in an existing Lucene index (created by the above mentioned java 
code) -- but i may be wrong (your choice of wording "implement these 
classes in Dspace" is very perplexing to me).

If i'm understanding correctly, then the key to the issue is all in how 
the numeric values are indexed as lucene "Fields" in your existing code -- 
but in the copy of DSIndexer.java i found, there are no numeric fields, 
just Text fields.  If you are indexing the numeric values as simple 
strings, then in Solr you would want to refer to them using hte legacy 
"IntField", "FloatField", etc... these assume simple string 
representations, and will sort properly using the numeric FieldCache -- 
BUT! -- range queries won't work.  Range queries require that the indexed 
terms be a "logical" ordering which isn't true for simple string 
representations of numbers ("100" is lexigraphically before "2").

If i actually have your question backwards -- if what you are asiking is 
how to modify the DSIndexer.java class to index fields in the same way as 
TrieDoubleField,TrieLongField,SortableIntField, etc... then the 
answer is much simpler: all FieldType's in Solr implement toInternal and 
toExternal methods ... the toInternal is what you need to call to "encode" 
your simple numeric values into the format to be indexed -- toExternal (or 
toObject) is how you cna get the original value back out.

For the "Trie" fields, these actually just use some utilities in Lucnee, 
so you could look at the code and use the same utilities w/o ever needing 
any Solr source code.

If i've completley missunderstood your question, plese post a followup 
explaining in more detail what it is you are trying to accomplish.

-Hoss



Re: SolrCore has a large number of SolrIndexSearchers retained in "infoRegistry"

2010-07-28 Thread skommuri

Hi,

It didn't seem like it improved the situation. The same exception stack
traces are found. 

I have explicitly defined the index readers to be reopened by specifying in
the solrconfig.xml

The exception occurs when the remote cores are being searched. I am
attaching the exceptions in a text file for reference. 
http://lucene.472066.n3.nabble.com/file/n1002926/solrexceptions.txt
solrexceptions.txt 

Couple of notes:

1. QueryComponent#process
 Is requesting for a SolrIndexSearcher twice by calling
SolrQueryRequest#getSearcher() but is never being closed. I see several
instances where getSearcher is being called but is never being properly
closed - performing a quick call heirarchy of SolrQueryRequest#getSearcher()
and SolrQueryRequest#close() will illustrate this point.

2. It may be the case that this exception was never encountered because
typical deployments are not heavily using Distributed Search across multiple
Solr Cores and/or it's a small memory leak and so never noticed?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCore-has-a-large-number-of-SolrIndexSearchers-retained-in-infoRegistry-tp483900p1002926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with field collapsing

2010-07-28 Thread Moazzam Khan
Hi All,

Whenever I use field collapse, the "numFound" attribute contains
exactly as many rows as I put in rows parameter instead of returning
total number of documents that matched the query. Is there a way to
rectify this?

Thanks,

Moazzam


RE: How to 'filter' facet results

2010-07-28 Thread Nagelberg, Kallin
ManBearPig is still a threat.

-Kallin Nagelberg

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Tuesday, July 27, 2010 7:44 PM
To: solr-user@lucene.apache.org
Subject: RE: How to 'filter' facet results

> Is there a way to tell Solr to only return a specific set of facet values?  I
> feel like the facet query must be able to do this, but I'm not really
> understanding the facet query.  In my specific case, I'd like to only see 
> facet
> values for the same values I pass in as query filters, i.e. if I run this 
> query:
>fq=keyword:man OR keyword:bear OR keyword:pig
>facet=on
>facet.field:keyword

> then I only want it to return the facet counts for man, bear, and pig.  The
> resulting docs might have a number of different values for keyword, in 
> addition

For the general case of filtering facet values, I've wanted to do that too in 
more complex situations, and there is no good way I've found. 

For your very specific use case though, yeah, you can do it with facet.query.  
Leave out the facet.field, but instead:

facet.query=keyword:man
facet.query=keyword:bear
facet.query=keyword:pig

You'll get three facet.query results in the response, one each for man, bear, 
pig. 

Solr behind the scenes will kind of do three seperate 'sub-queries', one for 
each facet.query, but since the query itself should be cached, you shouldn't 
notice much difference. Especially if you have a warming query that facets on 
the keyword field (I'm never entirely sure when caches created by warming 
queries will be used by a facet.query, or if it depends on the facet method in 
use, but it can't hurt). 

Jonathan



Re: Total number of terms in an index?

2010-07-28 Thread Jonathan Rochkind
At first I was thinking the TermsComponent might give you this, but 
oddly it seems not to.


http://wiki.apache.org/solr/TermsComponent




RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Tommasso,

I used your patch and tried it with the 1.4.1 solr.war from a fresh 1.4.1 
distribution, and it still gave me that NoSuchMethodError.  However, when I 
tried it with the newly-patched-and-compiled apache-solr-1.4.2-dev.war file it 
works.  I think I tried that before and it didn't work. 

In any case, thanks for the patch and the advice.  Looks like now it's working 
for me.

Best,
Dave




-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault 

> Alessandro & all,
>
> I was having the same issue with Tika crashing on certain PDFs.  I also
> noticed the bug where no content was extracted after upgrading Tika.
>
> When I went to the SOLR issue you link to below, I applied all the patches,
> downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
> got the following error:
> SEVERE: java.lang.NoSuchMethodError:
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> at
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> at java.lang.Thread.run(Thread.java:619)
>
> This is really weird because I DID apply the SolrResourceLoader patch that
> adds the getClassLoader method.  I even verified by going opening up the
> JARs and looking at the class file in Eclipse...I can see the
> SolrResourceLoader.getClassLoader() method.
>
> Does anyone know why it can't find the method?  After patching the source I
> did ant clean dist in the base directory of the Solr source tree and
> everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> the jars from dist/ and all the library dependencies from
> contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
> the logs looked good.
>
> I'm stumped.  It would be very nice to have a Solr implementation using the
> newest versions of PDFBox & Tika and actually have content being
> extracted...=)
>
> Best,
> Dave
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 27, 2010 6:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> Hi Jon,
> During the last days we front the same problem.
> Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> content and from others, Solr throws an exception during the Indexing
> Process .
> You must:
> Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> snapshot and tika-parsers 0.8.
> Update PdfBox and all related libraries.
> After that You have to patch Solr 1.4.1 following this patch :
>
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> This is the firts way to solve the problem.
>
> Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
> is
> thrown during the Indexing process, but no content is extracted.

Re: Total number of terms in an index?

2010-07-28 Thread Jason Rutherglen
Tom,

The total number of terms... Ah well, not a big deal, however yes the
flex branch does expose this so we can show this in Solr at some
point, hopefully outside of Solr's Luke impl.

On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom  wrote:
> Hi Jason,
>
> Are you looking for the total number of unique terms or total number of term 
> occurrences?
>
> Checkindex reports both, but does a bunch of other work so is probably not 
> the fastest.
>
> If you are looking for total number of term occurrences, you might look at 
> contrib/org/apache/lucene/misc/HighFreqTerms.java.
>
> If you are just looking for the total number of unique terms, I wonder if 
> there is some low level API that would allow you to just access the in-memory 
> representation of the tii file and then multiply the number of terms in it by 
> your indexDivisor (default 128). I haven't dug in to the code so I don't 
> actually know how the tii file gets loaded into a data structure in memory.  
> If there is api access, it seems like this might be the quickest way to get 
> the number of unique terms.  (Of course you would have to do this for each 
> segment).
>
> Tom
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Monday, July 26, 2010 8:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Total number of terms in an index?
>
>
> : Sorry, like the subject, I mean the total number of terms.
>
> it's not stored anywhere, so the only way to fetch it is to actually
> iteate all of the terms and count them (that's why LukeRequestHandler is
> slow slow to compute this particular value)
>
> If i remember right, someone mentioned at one point that flex would let
> you store data about stuff like this in your index as part of the segment
> writing, but frankly i'm still not sure how that iwll help -- because you
> unless your index is fully optimized, you still have to iterate the terms
> in each segment to 'de-dup' them.
>
>
> -Hoss
>
>


RE: display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply. I don't have much experience at JSP. I found tag 
library, and am trying to use "  ". Unfortunately I 
didn't get it work. 

Would you please give me more information? I really appreciate your help!

Thanks,
Xiaohui 

-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: display solr result in JSP

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).
hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
> I am new for solr. Just got example xml file index and search by following 
> solr tutorial. I wonder how I can get the search result display in a JSP. I 
> really appreciate any suggestions you can give.
>
> Thanks so much,
> Xiaohui
>
>



How do "NOT" queries work?

2010-07-28 Thread Kaan Meralan
I wonder how do "NOT" queries work. Is it a pass on the result set and
filtering out the "NOT" property or something like that?

Also is there anybody who does some performance checks on "NOT" queries? I
want to know whether there is a significant performance degradation or not
when you have "NOT" in a query.

Thanks...

//kaan


Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali

>> In solrconfig.xml, these two lines control that. Maybe they need to be
increased.
>> 5000
>> 1 

Where do I add those in solrconfig? These lines doesn't seem to be present
in the example solrconfig file...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali

Well I do have disk limitations too, and thats why I think slave nodes died,
when replicating data from master node. (as it was just adding on top of
existing index files).

:: What do you mean here? Optimizing is too CPU expensive? 

What I meant by avoid playing around with slave nodes is that doing anything
(including optimization on slave nodes) that may effect the live search
performance, unless I have no option.

:: Do you mean increase to double size? 

yes, as it did before on replication. But I didn't get a chance to run the
indexer yesterday. 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 1.4.1 field collapse

2010-07-28 Thread Moazzam Khan
Hi guys,

I read somewhere that Solr 1.4.1 has field collapse support by default
(without patching it) but I haven't been able to confirm it. Is this
true?

- Moazzam


RE: simple question from a newbie

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)
I think I got it to work.  If I do a wildcard search using the dc3.title
field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
every title that has a word in it that starts with 'c', which isn't
exactly what I wanted.  I'm guessing it's because of the
type="caseInsensitiveSort".  

Well, here is my schema for reference.  Thanks for your help.


- 
- 
   
-  
   
   
   
   
   
   
   
   
   
   
- 
- 
   
  
  
- 
- 
   
   
   
   
   
   
  
- 
   
   
   
   
   
   
   
  
  
- 
- 
   
   
   
   
   
   
   
  
  
- 
- 
   
   
   
  
  
   
  
- 
-  
   
   
   
   
   
   
   
   
-  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  

   
   
   
  

   
   
   
   
   
  
  PID 
  fgs.label 
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  

Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 


-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: simple question from a newbie

I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.

regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:
> Hi,
>
>
>
> I'm new to Solr and have a rather dumb question.  I want to do a query
> that returns all the Titles that start with a certain letter.  For
> example
>
>
>
> I have these titles:
>
> Results of in-mine research in support
>
> Cancer Reports
>
> State injury indicators report
>
> Cancer Reports
>
> Indexed dermal bibliography
>
> Childhood agricultural-related injury report
>
> Childhood agricultural injury prevention
>
>
>
>
>
> I want the query to return:
>
> Cancer Reports
>
> Cancer Reports
>
> Childhood agricultural-related injury report
>
> Childhood agricultural injury prevention
>
>
>
> I want something like dc.title=c* type query
>
>
>
> I know that I can facet by dc.title and then use the parameter
> facet.prefix=c but it returns something like this:
>
> Cancer Reports [2]
>
> Childhood agricultural-related injury report [1]
>
> Childhood agricultural injury prevention [1]
>
>
>
>
>
> Vincent Vu Nguyen
> Division of Science Quality and Translation
>
> Office of the Associate Director for Science
> Centers for Disease Control and Prevention (CDC)
> 404-498-6154
> Century Bldg 2400
> Atlanta, GA 30329
>
>
>
>
>





Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Hi Chantal,

thank you for the feedback.
I did not see the wood for the trees!
The SolrDocument's javadoc says the following: 
http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html


|*getFieldValue 
<../../../../org/apache/solr/common/SolrDocument.html#getFieldValue%28java.lang.String%29>*(String 
 name)| 


  Get the value or collection of values for a given field.

The magical word here is that little "or" :-).

I will try that tomorrow and give you a feedback!


Are you sure that you cannot change the SOLR results at query time
according to your needs?


Unfortunately, it is not possible in this case.

Kind regards,
Mitch


Am 28.07.2010 16:49, schrieb Chantal Ackermann:

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
   

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration.
However, the client shouldn't do that.
How did you solved that problem?
 

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


   

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:
 

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:

   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 



   




   




Re: Is there a cache for a query?

2010-07-28 Thread Moazzam Khan
As far as I know all searches get cache at least for some time. I am
not sure about field collapse results being cached.

- Moazzam
http://moazzam-khan.com



On Mon, Jul 26, 2010 at 9:48 PM, Li Li  wrote:
> I want a cache to cache all result of a query(all steps including
> collapse, highlight and facet).  I read
> http://wiki.apache.org/solr/SolrCaching, but can't find a global
> cache. Maybe I can use external cache to store key-value. Is there any
> one in solr?
>


Re: Spellchecking and frequency

2010-07-28 Thread Jonathan Rochkind



I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.


This is interesting to me. I also have not been that happy with standard 
solr spellcheck. 

In addition to possibly filing a JIRA for future fix to Solr itself, 
another option would be you could make your 'alternate' SpellCheck 
component available as a seperate .jar, so anyone could use it just by 
installing and specifying it in their solrconfig.xml.  I would encourage 
you to consider that, not as a replacement for suggesting a patch to 
Solr itself, but so people can use your improved spellchecker 
immediately, without waiting for possible Solr patches.


Jonathan



RE: Indexing Problem: Where's my data?

2010-07-28 Thread Michael Griffiths
Thanks - but my schema.xml is not recognizing field names specified in the 
data-config.xml.

For example - and I just tested this now - if I have in my data-config.xml:



And then in my schema.xml:



Then no documents are processed (e.g. I get rows queried, but 0 in the data handler UI).

But if I change that to:



... now documents are processed (e.g. 313).

Which, quite frankly, confuses me. I may be doing something else wrong (I 
changed my SQL as well, so I'm getting another failure, but I think it's 
separate to this one).

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, July 27, 2010 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing Problem: Where's my data?

Solr respects case for field names.  Database fields are supplied in 
lower-case, so it should be 'attribute_name' and 'string_value'. Also 
'product_id', etc.

It is easier if you carefully emulate every detail in the examples, for example 
lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc  wrote:
>
> for STRING_VALUE, I assume there is a property in the 'select *' 
> results called string_value? if so I'm not sure why it wouldn't work. 
> If not, then that's why, it doesn't have anything to put there.
>
> For ATTRIBUTE_NAME, is it possibly a case issue? you called it 
> 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your 
> schema...just something to check I guess.
>
> Also, not sure why you are using name= in your fields, for example, 
>  I thought 
> 'column' was the source field name and 'name' was supposed to be the 
> schema field name and if not there it would assume 'column' name. You 
> don't have a schema field called "Parent Family" so it looks like it's 
> defaulting to column name too which is lucky for you I suppose. But 
> you may want to either remove 'name=' or make it match the schema. 
> (and I may be completely wrong on this, it's been a while since I got DIH 
> going).
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp
> 1000660p1000843.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>



--
Lance Norskog
goks...@gmail.com




Re: simple question from a newbie

2010-07-28 Thread Ranveer
I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.


regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:

Hi,



I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example



I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention





I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention



I want something like dc.title=c* type query



I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]





Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329




   




Re: display solr result in JSP

2010-07-28 Thread Ranveer

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).

hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

   




Re: logic required for newbie

2010-07-28 Thread Jonty Rhods
Hi

thanks for reply..
 Actually requirement is diffrent (sorry if I am unable to clerify in first
mail).

basically follwoing are the fields name in schema as well:
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 6. landmark1
> 7. landmark2
> 8. landmark3
> 9. landmark4
> 10. landmark5

which carrying text...
for example:

1
some name
user_id
new york
USA
5th avenue
ms departmental store
base bakery
piza hut
ford motor

now if user search by "piza" then expected result like:

1
some name
user_id
new york
USA
piza hut

it means I want to ignore all other landmark which not match. By filter we
can filter the fields but here I dont know the
the field name because it depends on text match.

is there any other solution.. I am ready to change in schema or in logic. I
am using solrj.

please help me I stuck here..

with regards


On Wed, Jul 28, 2010 at 7:22 PM, rajini maski  wrote:

> you can index each of these field separately...
> field1-> Id
> field2-> name
> field3->user_id
> field4->country.
>
> 
> field7-> landmark
>
> While quering  you can specify  "q=Landmark9" This will return you
> results..
> And if you want only particular fields in output.. use the "fl" parameter
> in
> query...
>
> like
>
> http://localhost:8090/solr/select?
> indent=on&q=landmark9&fl=ID,user_id,country,landmark&
>
> This will give your desired solution..
>
>
>
>
> On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods 
> wrote:
>
> > Hi All,
> >
> > I am very new and learning solr.
> >
> > I have 10 column like following in table
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 6. landmark1
> > 7. landmark2
> > 8. landmark3
> > 9. landmark4
> > 10. landmark5
> >
> > when user search for landmark then  I want to return only one landmark
> > which
> > match. Rest of the landmark should ingnored..
> > expected result like following if user search by "landmark2"..
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 7. landmark2
> >
> > or if search by "landmark9"
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 9. landmark9
> >
> >
> > please help me to design the schema for this kind of requirement...
> >
> > thanks
> > with regards
> >
>


Re: SolrJ Response + JSON

2010-07-28 Thread Chantal Ackermann
Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
> Thank you, Chantal.
> 
> I have looked at this one: http://www.json.org/java/index.html
> 
> This seems to be an easy-to-understand-implementation.
> 
> However, I am wondering how to determine whether a SolrDocument's field 
> is multiValued or not.
> The JSONResponseWriter of Solr looks at the schema-configuration. 
> However, the client shouldn't do that.
> How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


> 
> Thanks for sharing ideas.
> 
> - Mitch
> 
> 
> Am 28.07.2010 15:35, schrieb Chantal Ackermann:
> > You could use org.apache.solr.handler.JsonLoader.
> > That one uses org.apache.noggit.JSONParser internally.
> > I've used the JacksonParser with Spring.
> >
> > http://json.org/ lists parsers for different programming languages.
> >
> > Cheers,
> > Chantal
> >
> > On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
> >
> >> Hello ,
> >>
> >> Second try to send a mail to the mailing list...
> >>
> >> I need to translate SolrJ's response into JSON-response.
> >> I can not query Solr directly, because I need to do some math with the
> >> responsed data, before I show the results to the client.
> >>
> >> Any experiences how to translate SolrJ's response into JSON without writing
> >> your own JSON Writer?
> >>
> >> Thank you.
> >> - Mitch
> >>  
> >
> >
> >





display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui


Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field 
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration. 
However, the client shouldn't do that.

How did you solved that problem?

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch
 



   




simple question from a newbie

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)
Hi,

 

I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example

 

I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

 

I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

I want something like dc.title=c* type query

 

I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]

 

 

Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 

 



Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
This was my same feeling :-) and so I went for the trunk to have things
working quickly, but I also have to consider which one is the best version
since I am going to deploy it in the near future in an enterprise
environment and choosing the best version is an importat step.
I am quite new to Solr but I agree with Alessandro that probably using a
slightly patched release should theoretically be more stable than the trunk
which get many updates weekly (and daily).
Cheers,
Tommaso

2010/7/28 David Thibault 

> Thanks, I'll try that then. I kind of figured that'd be the answer, but
> after fighting with Solr & ExtractingRequestHandler for 2 days I also just
> wanted to be done with it once it started working with 4.0...=)  However,
> stability would be better in the long run.
>
> Best,
> Dave
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Wednesday, July 28, 2010 9:33 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> In my opinion, the 1.4.1 version with the Patch is more Stable.
> Until 4.0 will be released 
>
> 2010/7/28 David Thibault 
>
> > Yesterday I did get this working with version 4.0 from trunk.  I haven't
> > fully tested it yet, but the content doesn't come through blank anymore,
> so
> > that's good.  Would it be more stable to stick with 1.4.1 and your patch
> to
> > get to Tika 0.8, or to stick with the 4.0 trunk version?
> >
> > Best,
> > Dave
> >
> > -Original Message-
> > From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> > Sent: Wednesday, July 28, 2010 3:31 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/Tika/PDFBox
> >
> > I attached a patch for Solr 1.4.1 release on
> > https://issues.apache.org/jira/browse/SOLR-1902 that made things work
> for
> > me.
> > This strange behaviour for me was due to the fact that I copied the
> patched
> > jars and war inside the dist directory but forgot to update the war
> inside
> > the example/webapps directory (that is inside Jetty).
> > Hope this helps.
> > Tommaso
> >
> > 2010/7/27 David Thibault 
> >
> > > Alessandro & all,
> > >
> > > I was having the same issue with Tika crashing on certain PDFs.  I also
> > > noticed the bug where no content was extracted after upgrading Tika.
> > >
> > > When I went to the SOLR issue you link to below, I applied all the
> > patches,
> > > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> > and
> > > got the following error:
> > > SEVERE: java.lang.NoSuchMethodError:
> > >
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > > at
> > >
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > > at
> > >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > > at
> > >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > > at
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > > at
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > > at
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > > at
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > > at
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > > at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > > at
> > >
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > > at
> > >
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > > at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > > at java.lang.Thread.run(Thread.java:619)
> > >
> > > This is really weird because I DID apply the SolrResourceLoader patch
> > that
> > > adds the getClassLoader method.  I even verified by going opening up
> the
> > > JARs and looking at the class file in Eclipse...I can see the
> > > SolrResourceLoader.getClassLoader() method.
> > >
> > > Does anyone know why it can't find the method?  After patching the
> source
> > I
> > > di

Re: logic required for newbie

2010-07-28 Thread rajini maski
you can index each of these field separately...
field1-> Id
field2-> name
field3->user_id
field4->country.


field7-> landmark

While quering  you can specify  "q=Landmark9" This will return you results..
And if you want only particular fields in output.. use the "fl" parameter in
query...

like

http://localhost:8090/solr/select?
indent=on&q=landmark9&fl=ID,user_id,country,landmark&

This will give your desired solution..




On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods  wrote:

> Hi All,
>
> I am very new and learning solr.
>
> I have 10 column like following in table
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 6. landmark1
> 7. landmark2
> 8. landmark3
> 9. landmark4
> 10. landmark5
>
> when user search for landmark then  I want to return only one landmark
> which
> match. Rest of the landmark should ingnored..
> expected result like following if user search by "landmark2"..
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 7. landmark2
>
> or if search by "landmark9"
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 9. landmark9
>
>
> please help me to design the schema for this kind of requirement...
>
> thanks
> with regards
>


RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Thanks, I'll try that then. I kind of figured that'd be the answer, but after 
fighting with Solr & ExtractingRequestHandler for 2 days I also just wanted to 
be done with it once it started working with 4.0...=)  However, stability would 
be better in the long run.

Best,
Dave

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Wednesday, July 28, 2010 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault 

> Yesterday I did get this working with version 4.0 from trunk.  I haven't
> fully tested it yet, but the content doesn't come through blank anymore, so
> that's good.  Would it be more stable to stick with 1.4.1 and your patch to
> get to Tika 0.8, or to stick with the 4.0 trunk version?
>
> Best,
> Dave
>
> -Original Message-
> From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> Sent: Wednesday, July 28, 2010 3:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> I attached a patch for Solr 1.4.1 release on
> https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
> me.
> This strange behaviour for me was due to the fact that I copied the patched
> jars and war inside the dist directory but forgot to update the war inside
> the example/webapps directory (that is inside Jetty).
> Hope this helps.
> Tommaso
>
> 2010/7/27 David Thibault 
>
> > Alessandro & all,
> >
> > I was having the same issue with Tika crashing on certain PDFs.  I also
> > noticed the bug where no content was extracted after upgrading Tika.
> >
> > When I went to the SOLR issue you link to below, I applied all the
> patches,
> > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> and
> > got the following error:
> > SEVERE: java.lang.NoSuchMethodError:
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > at
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > at
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > at
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > at
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > This is really weird because I DID apply the SolrResourceLoader patch
> that
> > adds the getClassLoader method.  I even verified by going opening up the
> > JARs and looking at the class file in Eclipse...I can see the
> > SolrResourceLoader.getClassLoader() method.
> >
> > Does anyone know why it can't find the method?  After patching the source
> I
> > did ant clean dist in the base directory of the Solr source tree and
> > everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> > the jars from dist/ and all the library dependencies from
> > contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
> in
> > the logs looked good.
> >
> > I'm stumped.  It would be very nice to have a Solr implementation using
> the
> > newest versions of PDFBox & Tika and actually have content being
> > extracted...=)
> >
> > Best,
> > Dave
> >
> >
> > -Original Message-
> > From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> > Sent: Tuesday, July 27, 2010 6:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/

RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-28 Thread David Thibault
If you don't store the content then you can't do highlighting, right?  Also, 
don't you just have to switch the text field to say stored="true" in your 
schema to store the text?  I don't understand why you're differentiating the 
behavior of ExtractingRequestHandler from the behavior of Solr in general.  
Doesn't ExtractingRequestHandler just pull the text out of whatever file you 
send it and then the rest of the processing happens like any other Solr post?

The bug I was experiencing was the same one that someone else brought up on the 
list yesterday in the emails entitled "Extracting PDF 
text/comment/callout/typewriter boxes with Solr   CELL/Tika/PDFBox".  It ties 
back to this bug:
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel

I saw that email shortly after I sent this one to the list (it figures, doesn't 
it...=).

I tried doing what they suggested on that bug report (patching Solr 1.4.x and 
using Tika 0.8-SNAPSHOT), but the patches failed when I applied it to my Solr 
1.4.1.  They have since added a patch for Solr 1.4.1.  I haven't tried it yet.  
However, I did get it working using Solr 4.0 out of trunk (which also uses Tika 
0.8 and updated PDFBox jars).  I have yet to decide which will be more stable, 
Solr 4.0 or patched Solr 1.4.1, both of which with updated PDFbox and Tika jars.

Best,
Dave

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Tuesday, July 27, 2010 8:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).

It should be indexed- that means if you search 'text' with a word from
the document, it will find those documents and bring back the file
name. Your app has to then use the file name.  Solr/Lucene is not
intended as a general-purpose content store, only an index.

The ERH wiki page doesn't quite say this. It describes what the ERH
does rather than what it does not do :)

On Mon, Jul 26, 2010 at 12:00 PM, David Thibault  wrote:
> Hello all,
>
> I’m working on a project with Solr.  I had 1.4.1 working OK using 
> ExtractingRequestHandler except that it was crashing on some PDFs.  I noticed 
> that Tika bundled with 1.4.1 was 0.4, which was kind of old.  I decided to 
> try updating to 0.7 as per the directions here: 
> http://wiki.apache.org/solr/ExtractingRequestHandler  but it was giving me 
> errors (I forget what they were specifically).
>
> Then I tried downloading Solr 3.1 from the source repository, which I noticed 
> came with Tika 0.7.  I figured this would be an easier route to get working.  
> Now I’m testing with 3.1 and 0.7 and I’m noticing my documents are going into 
> Solr OK, but they all have blank content (no document text stored in Solr).  
> I did see that the default “text” field is not stored. Changing that to 
> stored=true didn’t help.  Changing to 
> fmap.content=attr_content&uprefix=attr_content didn’t help either.  I have 
> attached all relevant info here.  Please let me know if someone sees 
> something I don’t (it’s entirely possible as I’m relatively new to Solr).
>
> Schema.xml:
> 
> 
>  
> omitNorms="true"/>
> omitNorms="true"/>
>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> precisionStep="0" positionIncrementGap="0"/>
> precisionStep="6" positionIncrementGap="0"/>
>
>
>
>
> omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
>
> positionIncrementGap="100">
>  
>
>  
>
> autoGeneratePhraseQueries="true">
>  
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
> splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1" catenateWords="0" catenateNumbe

Re: SolrJ Response + JSON

2010-07-28 Thread Chantal Ackermann
You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
> Hello , 
> 
> Second try to send a mail to the mailing list... 
> 
> I need to translate SolrJ's response into JSON-response.
> I can not query Solr directly, because I need to do some math with the
> responsed data, before I show the results to the client.
> 
> Any experiences how to translate SolrJ's response into JSON without writing
> your own JSON Writer?
> 
> Thank you. 
> - Mitch




Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Alessandro Benedetti
In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault 

> Yesterday I did get this working with version 4.0 from trunk.  I haven't
> fully tested it yet, but the content doesn't come through blank anymore, so
> that's good.  Would it be more stable to stick with 1.4.1 and your patch to
> get to Tika 0.8, or to stick with the 4.0 trunk version?
>
> Best,
> Dave
>
> -Original Message-
> From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> Sent: Wednesday, July 28, 2010 3:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> I attached a patch for Solr 1.4.1 release on
> https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
> me.
> This strange behaviour for me was due to the fact that I copied the patched
> jars and war inside the dist directory but forgot to update the war inside
> the example/webapps directory (that is inside Jetty).
> Hope this helps.
> Tommaso
>
> 2010/7/27 David Thibault 
>
> > Alessandro & all,
> >
> > I was having the same issue with Tika crashing on certain PDFs.  I also
> > noticed the bug where no content was extracted after upgrading Tika.
> >
> > When I went to the SOLR issue you link to below, I applied all the
> patches,
> > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> and
> > got the following error:
> > SEVERE: java.lang.NoSuchMethodError:
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > at
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > at
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > at
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > at
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > This is really weird because I DID apply the SolrResourceLoader patch
> that
> > adds the getClassLoader method.  I even verified by going opening up the
> > JARs and looking at the class file in Eclipse...I can see the
> > SolrResourceLoader.getClassLoader() method.
> >
> > Does anyone know why it can't find the method?  After patching the source
> I
> > did ant clean dist in the base directory of the Solr source tree and
> > everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> > the jars from dist/ and all the library dependencies from
> > contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
> in
> > the logs looked good.
> >
> > I'm stumped.  It would be very nice to have a Solr implementation using
> the
> > newest versions of PDFBox & Tika and actually have content being
> > extracted...=)
> >
> > Best,
> > Dave
> >
> >
> > -Original Message-
> > From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> > Sent: Tuesday, July 27, 2010 6:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/Tika/PDFBox
> >
> > Hi Jon,
> > During the last days we front the same problem.
> > Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> > content and from others, Solr throws an exception during the Indexing
> > Process .
> > You must:
> > Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> > snapshot and tika-parsers 0.8.
> > Update PdfBox and all related libraries.
> > After that You have to patch Solr 1.4.1 following this patch :
> >
> >
> https://issues.apache.org/jira/browse/SOLR-19

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Yesterday I did get this working with version 4.0 from trunk.  I haven't fully 
tested it yet, but the content doesn't come through blank anymore, so that's 
good.  Would it be more stable to stick with 1.4.1 and your patch to get to 
Tika 0.8, or to stick with the 4.0 trunk version?

Best,
Dave

-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault 

> Alessandro & all,
>
> I was having the same issue with Tika crashing on certain PDFs.  I also
> noticed the bug where no content was extracted after upgrading Tika.
>
> When I went to the SOLR issue you link to below, I applied all the patches,
> downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
> got the following error:
> SEVERE: java.lang.NoSuchMethodError:
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> at
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> at java.lang.Thread.run(Thread.java:619)
>
> This is really weird because I DID apply the SolrResourceLoader patch that
> adds the getClassLoader method.  I even verified by going opening up the
> JARs and looking at the class file in Eclipse...I can see the
> SolrResourceLoader.getClassLoader() method.
>
> Does anyone know why it can't find the method?  After patching the source I
> did ant clean dist in the base directory of the Solr source tree and
> everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> the jars from dist/ and all the library dependencies from
> contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
> the logs looked good.
>
> I'm stumped.  It would be very nice to have a Solr implementation using the
> newest versions of PDFBox & Tika and actually have content being
> extracted...=)
>
> Best,
> Dave
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 27, 2010 6:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> Hi Jon,
> During the last days we front the same problem.
> Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> content and from others, Solr throws an exception during the Indexing
> Process .
> You must:
> Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> snapshot and tika-parsers 0.8.
> Update PdfBox and all related libraries.
> After that You have to patch Solr 1.4.1 following this patch :
>
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> This is the firts way to solve the problem.
>
> Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
> is
> thrown during the Indexing process, but no content is extracted.
> Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
> sounds good but we don't know ho

Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Thank you Markus, Mark.

Seems to be a problem with Nabble, not with the mailing list. Sorry.

I can create a JSON-response, when I query Solr directly.
But I mean, that I query Solr through a SolrJ-client 
(CommonsHttpSolrServer).
That means my queries look a litte bit like that: 
http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr

So the response is shown as an QueryResponse-object, not as a JSON-string.

Or do I miss something here?

Am 28.07.2010 15:15, schrieb Markus Jelsma:

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


   




Re: SolrJ Response + JSON

2010-07-28 Thread Markus Jelsma
Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the 
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get 
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
> Hello ,
> 
> Second try to send a mail to the mailing list...
> 
> I need to translate SolrJ's response into JSON-response.
> I can not query Solr directly, because I need to do some math with the
> responsed data, before I show the results to the client.
> 
> Any experiences how to translate SolrJ's response into JSON without writing
> your own JSON Writer?
> 
> Thank you.
> - Mitch
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan


On 28 Jul 2010, at 2:08 pm, MitchK wrote:

Second try to send a mail to the mailing list...


Your first attempt got through as well.  Here's my original response.


I think you should just be able to add &wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



SolrJ Response + JSON

2010-07-28 Thread MitchK

Hello , 

Second try to send a mail to the mailing list... 

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you. 
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002115p1002115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan
I think you should just be able to add &wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Show elevated Result Differently

2010-07-28 Thread Vishal.Arora

I want to show elevated Result Different from others is there any way to do
this 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: clustering component

2010-07-28 Thread Stanislaw Osinski
> The patch should also work with trunk, but I haven't verified it yet.
>

I've just added a patch against solr trunk to
https://issues.apache.org/jira/browse/SOLR-1804.

S.


Highlighted match snippets highlight non-matched words (such as 0.1 and 0.2)

2010-07-28 Thread Jon Cram
Hi,

 

I'm observing some strange highlighted words in field value snippets
returned from Solr when matched term highlighting
(http://wiki.apache.org/solr/HighlightingParameters) is enabled.

 

In some cases, highlighted field value snippets contain highlighted
words that are not matches:

-  this appears to be in addition to highlighting words that are
matches

-  these non-match highlighted words are not pre-highlighted in
the indexed content

-  I've determined these are non-matches by appending
debugQuery=1 to the URL and examining the match detail information

 

I've so far observed this in relation to the strings "0", "0.1", "0.2"
and "0.4" in indexed content.

 

Real life example when searching for [gas]:

 

Relevant matched document result from Solr:





EXAMPLE prepares an extensive range of traceable calibration gas
standards with guaranteed relative uncertainties levels of 0.1% for
certain species (PDF 676 KB).





 

Related highlighted snippet:







EXAMPLE prepares an extensive range of traceable calibration
gas standards with guaranteed relative uncertainties levels of
0.1% for certain species (PDF 676 KB).







 

Note how the highlight snippet correctly highlights "gas" and
incorrectly highlights "0.1". I've observed similar results for other
searches where indexed content contains "0", "0.1", "0.2" and "0.4" and
where these numbers are highlighted incorrectly.

 

At this stage I'm trying to determine if this due to a poor
implementation on my behalf or whether this is a bug in Solr.

 

I'd really like to know if:

 

1.   Anyone else has observed this behaviour

2.   If this might be a known issue with Solr (I've tried to find
out but haven't had any luck)

3.   Anyone can test using something like
http:///select?hl=true&hl.fl=*&q=(phrase+that+contains+0.1+in+resp
onse)&hl.fragsize=0
 

 

Thanks,

Jon Cram

 



Get unique values

2010-07-28 Thread Rafal Bluszcz Zawadzki
Hi,

In my schema I have (inter ali) fields CollectionID, and CollectionName.
 These two values always match together, which means that for every value of
CollectionID there is matching value from CollectionName.

I am interested in query which allow me to get unique values of CollectionID
with matching CollectionNames (rest of fields is not interested for me in
this query).

I was thinking about facets, but they offer a bit more than I need.

Anyone has idea for query which allow me to get these results?

Cheers,

-- 
Rafał Zawadzki
http://dev.bluszcz.net


SolrJ Response + JSON

2010-07-28 Thread MitchK

Hello community,

I need to transform SolrJ - responses into JSON, after some computing on
those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange search

2010-07-28 Thread stockii

try to delete "solr.SnowballPorterFilterFactory" from your analyzerchain. i
had similar problems by using german  SnowballPorterFilterFactory
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1001990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr using 1500 threads - is that normal?

2010-07-28 Thread Christos Constantinou
Hi,

Solr seems to be crashing after a JVM exception that new threads cannot be 
created. I am writing in hope of advice from someone that has experienced this 
before. The exception that is causing the problem is:

Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to create 
new native thread

The memory that is allocated to Solr is 3072MB, which should be enough memory 
for a ~6GB data set. The documents are not big either, they have around 10 
fields of which only one stores large text ranging between 1k-50k.

The top command at the time of the crash shows Solr using around 1500 threads, 
which I assume it is not normal. Could it be that the threads are crashing one 
by one and new ones are created to cope with the queries?

In the log file, right after the the exception, there are several thousand 
commits before the server stalls completely. Normally, the log file would 
report 20-30 document existence queries per second, then 1 commit per 5-30 
seconds, and some more infrequent faceted document searches on the data. 
However after the exception, there are only commits until the end of the log 
file.

I am wondering if anyone has experienced this before or if it is some sort of 
known bug from Solr 1.4? Is there a way to increase the details of the 
exception in the logfile?

I am attaching the output of a grep Exception command on the logfile.

Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.commo

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-28 Thread Chantal Ackermann
Hi Lance!

On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote:
> Should this go into the trunk, or does it only solve problems unique
> to your use case?

The solution is generic but is an extension of XPathEntityProcessor
because I didn't want to touch the solr.war. This way I can deploy the
extension into SOLR_HOME/lib.
The problem that it solves is not one with XPathEntityProcessor but more
general. What it does:

It adds an attribute to the entity that I called "skipIfEmpty" which
takes the variable (it could even take more variables seperated by
whitespace).
On entityProcessor.init() which is called for sub-entities per row of
root entity (:= before every new request to the data source), the value
of the attribute is resolved and if it is null or empty (after
trimming), the entity is not further processed.
This attribute is only allowed on sub-entities.

It would probably be nicer to put that somewhere higher up in the class
hierarchy so that all entity processors could make use of it.
But I don't know how common the use case is - all examples I found where
more or less "joins" on primary keys.

Cheers,
Chantal

Here comes the code==

import static
org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE;

import java.util.Map;
import java.util.logging.Logger;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImportHandlerException;
import org.apache.solr.handler.dataimport.XPathEntityProcessor;

public class OptionalXPathEntityProcessor extends XPathEntityProcessor {
private Logger log =
Logger.getLogger(OptionalXPathEntityProcessor.class.getName());
private static final String SKIP_IF_EMPTY = "skipIfEmpty";
private boolean skip = false;

@Override
protected void firstInit(Context context) {
if (context.isRootEntity()) {
throw new DataImportHandlerException(SEVERE,
"OptionalXPathEntityProcessor not allowed for root entities.");
}
super.firstInit(context);
}

@Override
public void init(Context context) {
String value = 
context.getResolvedEntityAttribute(SKIP_IF_EMPTY);
if (value == null || value.trim().isEmpty()) {
skip = true;
} else {
super.init(context);
skip = false;
}
}

@Override
public Map nextRow() {
if (skip) return null;
return super.nextRow();
}
}




Re: Indexing Problem: Where's my data?

2010-07-28 Thread Chantal Ackermann
make sure to set stored="true" on every field you expect to be returned
in your results for later display.

Chantal




Re: Spellchecking and frequency

2010-07-28 Thread dan sutton
Hi Mark,

Thanks for that info looks very interesting, would be great to see your
code. Out of interest did you use the dictionary and the phonetic file? Did
you see better results with both?

In regards to the secondary part to check the corpus for matching
suggestions, would another way to do this is to have an event listener to
listen for commits, and then build the dictionary for matching corpus words
that way, then you avoid the performance hit at query time.

Cheers,
Dan

On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland wrote:

> Hi,
>
> I found the suggestions returned from the standard solr spellcheck not to
> be
> that relevant. By contrast, aspell, given the same dictionary and mispelled
> words, gives much more accurate suggestions.
>
> I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
> the java aspell library. I also extended the SpellCheckComponent to take
> the
> matrix of suggested words and query the corpus to find the first
> combination
> of suggestions which returned a match. This works well for my use case,
> where term frequency is irrelevant to spelling or scoring.
>
> I'd like to publish the code in case someone finds it useful (although it's
> a bit crude at the moment and will need a decent tidy up). Would it be
> appropriate to open up a Jira issue for this?
>
> Cheers,
> ~mark
>
> On 27 July 2010 09:33, dan sutton  wrote:
>
> > Hi,
> >
> > I've recently been looking into Spellchecking in solr, and was struck by
> > how
> > limited the usefulness of the tool was.
> >
> > Like most corpora , ours contains lots of different spelling mistakes for
> > the same word, so the 'spellcheck.onlyMorePopular' is not really that
> > useful
> > unless you click on it numerous times.
> >
> > I was thinking that since most of the time people spell words correctly
> why
> > was there no other frequency parameter that could enter into the score?
> > i.e.
> > something like:
> >
> > spell_score ~ edit_dist * freq
> >
> > I'm sure others have come across this issue and was wonding what
> > steps/algorithms they have used to overcome these limitations?
> >
> > Cheers,
> > Dan
> >
>


solr log file rotation

2010-07-28 Thread Christos Constantinou
Hi all,

I am running a Solr 1.4 instance on FreeBSD that generates large log files in 
very short periods. I used /etc/newsyslog to configure log file rotation, 
however once the log file is rotated then Solr doesn't write logs to the new 
file. I'm wondering if there is a way to let Solr know that the log file will 
be rotated so that it recreates a correct file handle?

Thanks

Christos

Re: Integration Problem

2010-07-28 Thread Jörg Wißmeier
Nobody out there who can help me with this problem?

I need to edit the result of the javabin writer (adding the results from
the webservice).
I hope it is possible to do that.

thanks in advance.

Am Mo 26.07.2010 10:25 schrieb Jörg Wißmeier :

>Hi everybody,
>
>since a while i'm working with solr and i have integrated it with
>liferay 6.0.3. So every search request from liferay is processed by
>solr
>and its index.
>But i have to integrate another system, this system offers me a
>webservice. the results of these webservice should be in the results of
>solr but not in the index of it.
>I tried to do that with a custom query handler and a custom response
>writer and i'm able to write in the response msg of solr but only in
>the
>response node of the xml msg an not in the results node.
>So is there any solution how i could write in the results node of the
>xml msg from solr?
>
>thanks in advance
>
>best regards
>joerg
>
>
>



Mit freundlichen Grüßen,


Jörg Wißmeier


___
Ancud IT-Beratung GmbH
Glockenhofstr. 47
90478 Nürnberg
Germany

T +49 911 25 25 68-0
F +49 911 25 25 68-68
joerg.wissme...@ancud.de
www.ancud.de

Angaben nach EHUG:
Ancud IT-Beratung GmbH, Nürnberg; Geschäftsführer Konstantin Böhm;
Amtsgericht Nürnberg, HRB 19954



Re: SpatialSearch: sorting by distance

2010-07-28 Thread Pavel Minchenkov
Does anybody know if this feature works correctly?
Or I'm doing something wrong?

2010/7/27 Pavel Minchenkov 

> Hi,
>
> I'm trying to sort by distance like this:
>
> sort=dist(2,lat,lon,55.755786,37.617633) asc
>
> In general results are sorted, but some documents are not in right order.
> I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate
> real distance after reading documents from Solr.
>
> Solr version from trunk.
>
>  omitNorms="true" positionIncrementGap="0"/>
> 
> 
>
> Thanks.
>
> --
> Pavel Minchenkov
>



-- 
Pavel Minchenkov


Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault 

> Alessandro & all,
>
> I was having the same issue with Tika crashing on certain PDFs.  I also
> noticed the bug where no content was extracted after upgrading Tika.
>
> When I went to the SOLR issue you link to below, I applied all the patches,
> downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
> got the following error:
> SEVERE: java.lang.NoSuchMethodError:
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> at
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> at java.lang.Thread.run(Thread.java:619)
>
> This is really weird because I DID apply the SolrResourceLoader patch that
> adds the getClassLoader method.  I even verified by going opening up the
> JARs and looking at the class file in Eclipse...I can see the
> SolrResourceLoader.getClassLoader() method.
>
> Does anyone know why it can't find the method?  After patching the source I
> did ant clean dist in the base directory of the Solr source tree and
> everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> the jars from dist/ and all the library dependencies from
> contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
> the logs looked good.
>
> I'm stumped.  It would be very nice to have a Solr implementation using the
> newest versions of PDFBox & Tika and actually have content being
> extracted...=)
>
> Best,
> Dave
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 27, 2010 6:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> Hi Jon,
> During the last days we front the same problem.
> Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> content and from others, Solr throws an exception during the Indexing
> Process .
> You must:
> Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> snapshot and tika-parsers 0.8.
> Update PdfBox and all related libraries.
> After that You have to patch Solr 1.4.1 following this patch :
>
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> This is the firts way to solve the problem.
>
> Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
> is
> thrown during the Indexing process, but no content is extracted.
> Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
> sounds good but we don't know how stableit is!
> I hope you have now a clear  vision of this issue,
> Best Regards
>
>
>
> 2010/7/26 Sharp, Jonathan 
>
> >
> > Every so often I need to index new batches of scanned PDFs and
> occasionally
> > Adobe's OCR can't recognize the text in a couple of these documents. In
> > these situations I would like to type in a small amount of text onto the
> > document and have it be extracted by Solr CELL.
> >
> > Adobe Pro 9 has a number of different ways to add text directly to a PDF
> > file:
> >
> > *Typewriter
> > *Sticky Note
> > *Callo

Re: Any tips/guidelines to turning the Solr/luence performance in a master/slave/sharding environment

2010-07-28 Thread Tommaso Teofili
Hi,
I think the starting point should be :
http://wiki.apache.org/solr/SolrPerformanceFactors
For example you could start playing with the mergeFactor parameter.
My 2 cents,
Tommaso

2010/7/27 Chengyang 

> How to reduce the index files size, decreate the sync time between each
> nodes. decrease the index create/update time.
> Thanks.
>
>


Re: question about relevance

2010-07-28 Thread Bharat Jain
Well you are correct Erik that this is a database-ish thing try to achieve
in solr and unfortunately the sin :) had been committed by somebody else :)
and now we are running into relevancy issues.

Let me try to state the problem more casually.

1. There are user records of type A, B, C etc. (userId field in index is
common to all records)
2. A user can have any number of A, B, C etc (e.g. think of A being a
language then user can know many languages like french, english, german etc)
3. Records are currently stored as a document in index.
4. A given query can match multiple records for the user
5. If for a user more records are matched (e.g. if he knows both french and
german) then he is more relevant and should come top in UI. This is the
reason I wanted to add lucene scores assuming the greater score means more
relevance.

Hope you got what I was saying.

Another idea for this situation is doing faceting on userId field and then
add the score but currently I think lucene only support facet count,
basically solr will give you only count of docs it matched. Can I get sum of
the score of documents that matched?


Thanks
Bharat Jain


On Tue, Jul 27, 2010 at 5:58 AM, Erick Erickson wrote:

> I'm having trouble getting my head around what you're trying to accomplish,
> so if this is off base you know why .
>
> But what it smells like is that you're trying to do database-ish things in
> a SOLR index, which is almost always the wrong approach. Is there a
> way to index redundant data with each document so all you have to do
> to get the "relevant" users is a simple query?
>
> Adding scores is also suspect.. I don't see how that does predictable
> things.
>
> But I'm also failing completely to understand what a "relevant" user is.
>
> not much help, if this is way off base perhaps you could provide some
> additional use-cases?
>
> Best
> Erick
>
> On Mon, Jul 26, 2010 at 2:37 AM, Bharat Jain 
> wrote:
>
> > Hello All,
> >
> > I have a index which store multiple objects belonging to a user
> >
> > for e.g.
> > 
> >   -> Identifies user
> > object type e.g. userBasic or userAdv
> >
> >  
> >> MAPS to
> userBasicInfoObject
> >  
> >
> >  
> >   -> MAPS to userAdvInfoObject
> >  
> >
> > 
> >
> >
> > Now when I am doing some query I get multiple records mapping to java
> > objects (identified by objType) that belong to the same user.
> >
> >
> > Now I want to show the relevant users at the top of the list. I am
> thinking
> > of adding the Lucene scores of different result documents to get the best
> > scores. Is this correct approach to get the relevance of the user?
> >
> > Thanks
> > Bharat Jain
> >
>