date:20100728

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)

Tommasso,

I used your patch and tried it with the 1.4.1 solr.war from a fresh 1.4.1 
distribution, and it still gave me that NoSuchMethodError.  However, when I 
tried it with the newly-patched-and-compiled apache-solr-1.4.2-dev.war file it 
works.  I think I tried that before and it didn't work. 

In any case, thanks for the patch and the advice.  Looks like now it's working 
for me.

Best,
Dave




-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault 

> Alessandro & all,
>
> I was having the same issue with Tika crashing on certain PDFs.  I also
> noticed the bug where no content was extracted after upgrading Tika.
>
> When I went to the SOLR issue you link to below, I applied all the patches,
> downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
> got the following error:
> SEVERE: java.lang.NoSuchMethodError:
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> at
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> at java.lang.Thread.run(Thread.java:619)
>
> This is really weird because I DID apply the SolrResourceLoader patch that
> adds the getClassLoader method.  I even verified by going opening up the
> JARs and looking at the class file in Eclipse...I can see the
> SolrResourceLoader.getClassLoader() method.
>
> Does anyone know why it can't find the method?  After patching the source I
> did ant clean dist in the base directory of the Solr source tree and
> everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> the jars from dist/ and all the library dependencies from
> contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
> the logs looked good.
>
> I'm stumped.  It would be very nice to have a Solr implementation using the
> newest versions of PDFBox & Tika and actually have content being
> extracted...=)
>
> Best,
> Dave
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 27, 2010 6:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> Hi Jon,
> During the last days we front the same problem.
> Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> content and from others, Solr throws an exception during the Indexing
> Process .
> You must:
> Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> snapshot and tika-parsers 0.8.
> Update PdfBox and all related libraries.
> After that You have to patch Solr 1.4.1 following this patch :
>
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> This is the firts way to solve the problem.
>
> Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
> is
> thrown during the Indexing process, but no content is extracted.

Re: Total number of terms in an index?

2010-07-28 Thread Jason Rutherglen

Tom,

The total number of terms... Ah well, not a big deal, however yes the
flex branch does expose this so we can show this in Solr at some
point, hopefully outside of Solr's Luke impl.

On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom  wrote:
> Hi Jason,
>
> Are you looking for the total number of unique terms or total number of term 
> occurrences?
>
> Checkindex reports both, but does a bunch of other work so is probably not 
> the fastest.
>
> If you are looking for total number of term occurrences, you might look at 
> contrib/org/apache/lucene/misc/HighFreqTerms.java.
>
> If you are just looking for the total number of unique terms, I wonder if 
> there is some low level API that would allow you to just access the in-memory 
> representation of the tii file and then multiply the number of terms in it by 
> your indexDivisor (default 128). I haven't dug in to the code so I don't 
> actually know how the tii file gets loaded into a data structure in memory.  
> If there is api access, it seems like this might be the quickest way to get 
> the number of unique terms.  (Of course you would have to do this for each 
> segment).
>
> Tom
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Monday, July 26, 2010 8:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Total number of terms in an index?
>
>
> : Sorry, like the subject, I mean the total number of terms.
>
> it's not stored anywhere, so the only way to fetch it is to actually
> iteate all of the terms and count them (that's why LukeRequestHandler is
> slow slow to compute this particular value)
>
> If i remember right, someone mentioned at one point that flex would let
> you store data about stuff like this in your index as part of the segment
> writing, but frankly i'm still not sure how that iwll help -- because you
> unless your index is fully optimized, you still have to iterate the terms
> in each segment to 'de-dup' them.
>
>
> -Hoss
>
>

RE: display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thanks so much for your reply. I don't have much experience at JSP. I found tag 
library, and am trying to use "  ". Unfortunately I 
didn't get it work. 

Would you please give me more information? I really appreciate your help!

Thanks,
Xiaohui 

-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: display solr result in JSP

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).
hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
> I am new for solr. Just got example xml file index and search by following 
> solr tutorial. I wonder how I can get the search result display in a JSP. I 
> really appreciate any suggestions you can give.
>
> Thanks so much,
> Xiaohui
>
>

How do "NOT" queries work?

2010-07-28 Thread Kaan Meralan

I wonder how do "NOT" queries work. Is it a pass on the result set and
filtering out the "NOT" property or something like that?

Also is there anybody who does some performance checks on "NOT" queries? I
want to know whether there is a significant performance degradation or not
when you have "NOT" in a query.

Thanks...

//kaan

Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali


>> In solrconfig.xml, these two lines control that. Maybe they need to be
increased.
>> 5000
>> 1 

Where do I add those in solrconfig? These lines doesn't seem to be present
in the example solrconfig file...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali


Well I do have disk limitations too, and thats why I think slave nodes died,
when replicating data from master node. (as it was just adding on top of
existing index files).

:: What do you mean here? Optimizing is too CPU expensive? 

What I meant by avoid playing around with slave nodes is that doing anything
(including optimization on slave nodes) that may effect the live search
performance, unless I have no option.

:: Do you mean increase to double size? 

yes, as it did before on replication. But I didn't get a chance to run the
indexer yesterday. 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 1.4.1 field collapse

2010-07-28 Thread Moazzam Khan

Hi guys,

I read somewhere that Solr 1.4.1 has field collapse support by default
(without patching it) but I haven't been able to confirm it. Is this
true?

- Moazzam

RE: simple question from a newbie

I think I got it to work.  If I do a wildcard search using the dc3.title
field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
every title that has a word in it that starts with 'c', which isn't
exactly what I wanted.  I'm guessing it's because of the
type="caseInsensitiveSort".  

Well, here is my schema for reference.  Thanks for your help.


- 
- 
   
-  
   
   
   
   
   
   
   
   
   
   
- 
- 
   
  
  
- 
- 
   
   
   
   
   
   
  
- 
   
   
   
   
   
   
   
  
  
- 
- 
   
   
   
   
   
   
   
  
  
- 
- 
   
   
   
  
  
   
  
- 
-  
   
   
   
   
   
   
   
   
-  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  

   
   
   
  

   
   
   
   
   
  
  PID 
  fgs.label 
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  

Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 


-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: simple question from a newbie

I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.

regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:
> Hi,
>
>
>
> I'm new to Solr and have a rather dumb question.  I want to do a query
> that returns all the Titles that start with a certain letter.  For
> example
>
>
>
> I have these titles:
>
> Results of in-mine research in support
>
> Cancer Reports
>
> State injury indicators report
>
> Cancer Reports
>
> Indexed dermal bibliography
>
> Childhood agricultural-related injury report
>
> Childhood agricultural injury prevention
>
>
>
>
>
> I want the query to return:
>
> Cancer Reports
>
> Cancer Reports
>
> Childhood agricultural-related injury report
>
> Childhood agricultural injury prevention
>
>
>
> I want something like dc.title=c* type query
>
>
>
> I know that I can facet by dc.title and then use the parameter
> facet.prefix=c but it returns something like this:
>
> Cancer Reports [2]
>
> Childhood agricultural-related injury report [1]
>
> Childhood agricultural injury prevention [1]
>
>
>
>
>
> Vincent Vu Nguyen
> Division of Science Quality and Translation
>
> Office of the Associate Director for Science
> Centers for Disease Control and Prevention (CDC)
> 404-498-6154
> Century Bldg 2400
> Atlanta, GA 30329
>
>
>
>
>

Re: SolrJ Response + JSON

Hi Chantal,

thank you for the feedback.
I did not see the wood for the trees!
The SolrDocument's javadoc says the following:
http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html

|*getFieldValue
<../../../../org/apache/solr/common/SolrDocument.html#getFieldValue%28java.lang.String%29>*(String
name)|

Get the value or collection of values for a given field.

The magical word here is that little "or" :-).

I will try that tomorrow and give you a feedback!

Are you sure that you cannot change the SOLR results at query time
according to your needs?

Unfortunately, it is not possible in this case.

Kind regards,
Mitch

Am 28.07.2010 16:49, schrieb Chantal Ackermann:

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration.
However, the client shouldn't do that.
How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal

Thanks for sharing ideas.

- Mitch

Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

Re: Is there a cache for a query?

2010-07-28 Thread Moazzam Khan

As far as I know all searches get cache at least for some time. I am
not sure about field collapse results being cached.

- Moazzam
http://moazzam-khan.com



On Mon, Jul 26, 2010 at 9:48 PM, Li Li  wrote:
> I want a cache to cache all result of a query(all steps including
> collapse, highlight and facet).  I read
> http://wiki.apache.org/solr/SolrCaching, but can't find a global
> cache. Maybe I can use external cache to store key-value. Is there any
> one in solr?
>

Re: Spellchecking and frequency

2010-07-28 Thread Jonathan Rochkind




I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.


This is interesting to me. I also have not been that happy with standard 
solr spellcheck. 

In addition to possibly filing a JIRA for future fix to Solr itself, 
another option would be you could make your 'alternate' SpellCheck 
component available as a seperate .jar, so anyone could use it just by 
installing and specifying it in their solrconfig.xml.  I would encourage 
you to consider that, not as a replacement for suggesting a patch to 
Solr itself, but so people can use your improved spellchecker 
immediately, without waiting for possible Solr patches.


Jonathan

RE: Indexing Problem: Where's my data?

2010-07-28 Thread Michael Griffiths

Thanks - but my schema.xml is not recognizing field names specified in the 
data-config.xml.

For example - and I just tested this now - if I have in my data-config.xml:

And then in my schema.xml:

Then no documents are processed (e.g. I get rows queried, but 0 in the data handler UI).

But if I change that to:

... now documents are processed (e.g. 313).

Which, quite frankly, confuses me. I may be doing something else wrong (I 
changed my SQL as well, so I'm getting another failure, but I think it's 
separate to this one).

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, July 27, 2010 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing Problem: Where's my data?

Solr respects case for field names.  Database fields are supplied in 
lower-case, so it should be 'attribute_name' and 'string_value'. Also 
'product_id', etc.

It is easier if you carefully emulate every detail in the examples, for example 
lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc  wrote:
>
> for STRING_VALUE, I assume there is a property in the 'select *' 
> results called string_value? if so I'm not sure why it wouldn't work. 
> If not, then that's why, it doesn't have anything to put there.
>
> For ATTRIBUTE_NAME, is it possibly a case issue? you called it 
> 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your 
> schema...just something to check I guess.
>
> Also, not sure why you are using name= in your fields, for example, 
>  I thought 
> 'column' was the source field name and 'name' was supposed to be the 
> schema field name and if not there it would assume 'column' name. You 
> don't have a schema field called "Parent Family" so it looks like it's 
> defaulting to column name too which is lucky for you I suppose. But 
> you may want to either remove 'name=' or make it match the schema. 
> (and I may be completely wrong on this, it's been a while since I got DIH 
> going).
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp
> 1000660p1000843.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>

--
Lance Norskog
goks...@gmail.com

Re: simple question from a newbie

2010-07-28 Thread Ranveer

I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.


regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:

Hi,



I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example



I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention





I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention



I want something like dc.title=c* type query



I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]





Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329

Re: display solr result in JSP

2010-07-28 Thread Ranveer


Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).

hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

Re: logic required for newbie

2010-07-28 Thread Jonty Rhods

Hi

thanks for reply..
 Actually requirement is diffrent (sorry if I am unable to clerify in first
mail).

basically follwoing are the fields name in schema as well:
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 6. landmark1
> 7. landmark2
> 8. landmark3
> 9. landmark4
> 10. landmark5

which carrying text...
for example:

1
some name
user_id
new york
USA
5th avenue
ms departmental store
base bakery
piza hut
ford motor

now if user search by "piza" then expected result like:

1
some name
user_id
new york
USA
piza hut

it means I want to ignore all other landmark which not match. By filter we
can filter the fields but here I dont know the
the field name because it depends on text match.

is there any other solution.. I am ready to change in schema or in logic. I
am using solrj.

please help me I stuck here..

with regards


On Wed, Jul 28, 2010 at 7:22 PM, rajini maski  wrote:

> you can index each of these field separately...
> field1-> Id
> field2-> name
> field3->user_id
> field4->country.
>
> 
> field7-> landmark
>
> While quering  you can specify  "q=Landmark9" This will return you
> results..
> And if you want only particular fields in output.. use the "fl" parameter
> in
> query...
>
> like
>
> http://localhost:8090/solr/select?
> indent=on&q=landmark9&fl=ID,user_id,country,landmark&
>
> This will give your desired solution..
>
>
>
>
> On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods 
> wrote:
>
> > Hi All,
> >
> > I am very new and learning solr.
> >
> > I have 10 column like following in table
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 6. landmark1
> > 7. landmark2
> > 8. landmark3
> > 9. landmark4
> > 10. landmark5
> >
> > when user search for landmark then  I want to return only one landmark
> > which
> > match. Rest of the landmark should ingnored..
> > expected result like following if user search by "landmark2"..
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 7. landmark2
> >
> > or if search by "landmark9"
> >
> > 1. id
> > 2. name
> > 3. user_id
> > 4. location
> > 5. country
> > 9. landmark9
> >
> >
> > please help me to design the schema for this kind of requirement...
> >
> > thanks
> > with regards
> >
>

Re: SolrJ Response + JSON

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
> Thank you, Chantal.
> 
> I have looked at this one: http://www.json.org/java/index.html
> 
> This seems to be an easy-to-understand-implementation.
> 
> However, I am wondering how to determine whether a SolrDocument's field 
> is multiValued or not.
> The JSONResponseWriter of Solr looks at the schema-configuration. 
> However, the client shouldn't do that.
> How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


> 
> Thanks for sharing ideas.
> 
> - Mitch
> 
> 
> Am 28.07.2010 15:35, schrieb Chantal Ackermann:
> > You could use org.apache.solr.handler.JsonLoader.
> > That one uses org.apache.noggit.JSONParser internally.
> > I've used the JacksonParser with Spring.
> >
> > http://json.org/ lists parsers for different programming languages.
> >
> > Cheers,
> > Chantal
> >
> > On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
> >
> >> Hello ,
> >>
> >> Second try to send a mail to the mailing list...
> >>
> >> I need to translate SolrJ's response into JSON-response.
> >> I can not query Solr directly, because I need to do some math with the
> >> responsed data, before I show the results to the client.
> >>
> >> Any experiences how to translate SolrJ's response into JSON without writing
> >> your own JSON Writer?
> >>
> >> Thank you.
> >> - Mitch
> >>  
> >
> >
> >

display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

Re: SolrJ Response + JSON

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)


Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field 
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration. 
However, the client shouldn't do that.

How did you solved that problem?

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

simple question from a newbie

Hi,

 

I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example

 

I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

 

I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

I want something like dc.title=c* type query

 

I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]

 

 

Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili

This was my same feeling :-) and so I went for the trunk to have things
working quickly, but I also have to consider which one is the best version
since I am going to deploy it in the near future in an enterprise
environment and choosing the best version is an importat step.
I am quite new to Solr but I agree with Alessandro that probably using a
slightly patched release should theoretically be more stable than the trunk
which get many updates weekly (and daily).
Cheers,
Tommaso

2010/7/28 David Thibault 

> Thanks, I'll try that then. I kind of figured that'd be the answer, but
> after fighting with Solr & ExtractingRequestHandler for 2 days I also just
> wanted to be done with it once it started working with 4.0...=)  However,
> stability would be better in the long run.
>
> Best,
> Dave
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Wednesday, July 28, 2010 9:33 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> In my opinion, the 1.4.1 version with the Patch is more Stable.
> Until 4.0 will be released 
>
> 2010/7/28 David Thibault 
>
> > Yesterday I did get this working with version 4.0 from trunk.  I haven't
> > fully tested it yet, but the content doesn't come through blank anymore,
> so
> > that's good.  Would it be more stable to stick with 1.4.1 and your patch
> to
> > get to Tika 0.8, or to stick with the 4.0 trunk version?
> >
> > Best,
> > Dave
> >
> > -Original Message-
> > From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> > Sent: Wednesday, July 28, 2010 3:31 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/Tika/PDFBox
> >
> > I attached a patch for Solr 1.4.1 release on
> > https://issues.apache.org/jira/browse/SOLR-1902 that made things work
> for
> > me.
> > This strange behaviour for me was due to the fact that I copied the
> patched
> > jars and war inside the dist directory but forgot to update the war
> inside
> > the example/webapps directory (that is inside Jetty).
> > Hope this helps.
> > Tommaso
> >
> > 2010/7/27 David Thibault 
> >
> > > Alessandro & all,
> > >
> > > I was having the same issue with Tika crashing on certain PDFs.  I also
> > > noticed the bug where no content was extracted after upgrading Tika.
> > >
> > > When I went to the SOLR issue you link to below, I applied all the
> > patches,
> > > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> > and
> > > got the following error:
> > > SEVERE: java.lang.NoSuchMethodError:
> > >
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > > at
> > >
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > > at
> > >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > > at
> > >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > > at
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > > at
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > > at
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > > at
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > > at
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > > at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > > at
> > >
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > > at
> > >
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > > at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > > at java.lang.Thread.run(Thread.java:619)
> > >
> > > This is really weird because I DID apply the SolrResourceLoader patch
> > that
> > > adds the getClassLoader method.  I even verified by going opening up
> the
> > > JARs and looking at the class file in Eclipse...I can see the
> > > SolrResourceLoader.getClassLoader() method.
> > >
> > > Does anyone know why it can't find the method?  After patching the
> source
> > I
> > > di

Re: logic required for newbie

2010-07-28 Thread rajini maski

you can index each of these field separately...
field1-> Id
field2-> name
field3->user_id
field4->country.

field7-> landmark

While quering  you can specify  "q=Landmark9" This will return you results..
And if you want only particular fields in output.. use the "fl" parameter in
query...

like

http://localhost:8090/solr/select?
indent=on&q=landmark9&fl=ID,user_id,country,landmark&

This will give your desired solution..

On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods  wrote:

> Hi All,
>
> I am very new and learning solr.
>
> I have 10 column like following in table
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 6. landmark1
> 7. landmark2
> 8. landmark3
> 9. landmark4
> 10. landmark5
>
> when user search for landmark then  I want to return only one landmark
> which
> match. Rest of the landmark should ingnored..
> expected result like following if user search by "landmark2"..
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 7. landmark2
>
> or if search by "landmark9"
>
> 1. id
> 2. name
> 3. user_id
> 4. location
> 5. country
> 9. landmark9
>
>
> please help me to design the schema for this kind of requirement...
>
> thanks
> with regards
>

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

Thanks, I'll try that then. I kind of figured that'd be the answer, but after 
fighting with Solr & ExtractingRequestHandler for 2 days I also just wanted to 
be done with it once it started working with 4.0...=)  However, stability would 
be better in the long run.

Best,
Dave

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Wednesday, July 28, 2010 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault 

> Yesterday I did get this working with version 4.0 from trunk.  I haven't
> fully tested it yet, but the content doesn't come through blank anymore, so
> that's good.  Would it be more stable to stick with 1.4.1 and your patch to
> get to Tika 0.8, or to stick with the 4.0 trunk version?
>
> Best,
> Dave
>
> -Original Message-
> From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> Sent: Wednesday, July 28, 2010 3:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> I attached a patch for Solr 1.4.1 release on
> https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
> me.
> This strange behaviour for me was due to the fact that I copied the patched
> jars and war inside the dist directory but forgot to update the war inside
> the example/webapps directory (that is inside Jetty).
> Hope this helps.
> Tommaso
>
> 2010/7/27 David Thibault 
>
> > Alessandro & all,
> >
> > I was having the same issue with Tika crashing on certain PDFs.  I also
> > noticed the bug where no content was extracted after upgrading Tika.
> >
> > When I went to the SOLR issue you link to below, I applied all the
> patches,
> > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> and
> > got the following error:
> > SEVERE: java.lang.NoSuchMethodError:
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > at
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > at
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > at
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > at
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > This is really weird because I DID apply the SolrResourceLoader patch
> that
> > adds the getClassLoader method.  I even verified by going opening up the
> > JARs and looking at the class file in Eclipse...I can see the
> > SolrResourceLoader.getClassLoader() method.
> >
> > Does anyone know why it can't find the method?  After patching the source
> I
> > did ant clean dist in the base directory of the Solr source tree and
> > everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> > the jars from dist/ and all the library dependencies from
> > contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
> in
> > the logs looked good.
> >
> > I'm stumped.  It would be very nice to have a Solr implementation using
> the
> > newest versions of PDFBox & Tika and actually have content being
> > extracted...=)
> >
> > Best,
> > Dave
> >
> >
> > -Original Message-
> > From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> > Sent: Tuesday, July 27, 2010 6:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/

RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content

If you don't store the content then you can't do highlighting, right?  Also, 
don't you just have to switch the text field to say stored="true" in your 
schema to store the text?  I don't understand why you're differentiating the 
behavior of ExtractingRequestHandler from the behavior of Solr in general.  
Doesn't ExtractingRequestHandler just pull the text out of whatever file you 
send it and then the rest of the processing happens like any other Solr post?

The bug I was experiencing was the same one that someone else brought up on the 
list yesterday in the emails entitled "Extracting PDF 
text/comment/callout/typewriter boxes with Solr   CELL/Tika/PDFBox".  It ties 
back to this bug:
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel

I saw that email shortly after I sent this one to the list (it figures, doesn't 
it...=).

I tried doing what they suggested on that bug report (patching Solr 1.4.x and 
using Tika 0.8-SNAPSHOT), but the patches failed when I applied it to my Solr 
1.4.1.  They have since added a patch for Solr 1.4.1.  I haven't tried it yet.  
However, I did get it working using Solr 4.0 out of trunk (which also uses Tika 
0.8 and updated PDFBox jars).  I have yet to decide which will be more stable, 
Solr 4.0 or patched Solr 1.4.1, both of which with updated PDFbox and Tika jars.

Best,
Dave

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Tuesday, July 27, 2010 8:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).

It should be indexed- that means if you search 'text' with a word from
the document, it will find those documents and bring back the file
name. Your app has to then use the file name.  Solr/Lucene is not
intended as a general-purpose content store, only an index.

The ERH wiki page doesn't quite say this. It describes what the ERH
does rather than what it does not do :)

On Mon, Jul 26, 2010 at 12:00 PM, David Thibault  wrote:
> Hello all,
>
> I’m working on a project with Solr.  I had 1.4.1 working OK using 
> ExtractingRequestHandler except that it was crashing on some PDFs.  I noticed 
> that Tika bundled with 1.4.1 was 0.4, which was kind of old.  I decided to 
> try updating to 0.7 as per the directions here: 
> http://wiki.apache.org/solr/ExtractingRequestHandler  but it was giving me 
> errors (I forget what they were specifically).
>
> Then I tried downloading Solr 3.1 from the source repository, which I noticed 
> came with Tika 0.7.  I figured this would be an easier route to get working.  
> Now I’m testing with 3.1 and 0.7 and I’m noticing my documents are going into 
> Solr OK, but they all have blank content (no document text stored in Solr).  
> I did see that the default “text” field is not stored. Changing that to 
> stored=true didn’t help.  Changing to 
> fmap.content=attr_content&uprefix=attr_content didn’t help either.  I have 
> attached all relevant info here.  Please let me know if someone sees 
> something I don’t (it’s entirely possible as I’m relatively new to Solr).
>
> Schema.xml:
> 
> 
>  
> omitNorms="true"/>
> omitNorms="true"/>
>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> omitNorms="true" positionIncrementGap="0"/>
> precisionStep="0" positionIncrementGap="0"/>
> precisionStep="6" positionIncrementGap="0"/>
>
>
>
>
> omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
>
> positionIncrementGap="100">
>  
>
>  
>
> autoGeneratePhraseQueries="true">
>  
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
> splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1" catenateWords="0" catenateNumbe

Re: SolrJ Response + JSON

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
> Hello , 
> 
> Second try to send a mail to the mailing list... 
> 
> I need to translate SolrJ's response into JSON-response.
> I can not query Solr directly, because I need to do some math with the
> responsed data, before I show the results to the client.
> 
> Any experiences how to translate SolrJ's response into JSON without writing
> your own JSON Writer?
> 
> Thank you. 
> - Mitch

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Alessandro Benedetti

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault 

> Yesterday I did get this working with version 4.0 from trunk.  I haven't
> fully tested it yet, but the content doesn't come through blank anymore, so
> that's good.  Would it be more stable to stick with 1.4.1 and your patch to
> get to Tika 0.8, or to stick with the 4.0 trunk version?
>
> Best,
> Dave
>
> -Original Message-
> From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
> Sent: Wednesday, July 28, 2010 3:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> I attached a patch for Solr 1.4.1 release on
> https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
> me.
> This strange behaviour for me was due to the fact that I copied the patched
> jars and war inside the dist directory but forgot to update the war inside
> the example/webapps directory (that is inside Jetty).
> Hope this helps.
> Tommaso
>
> 2010/7/27 David Thibault 
>
> > Alessandro & all,
> >
> > I was having the same issue with Tika crashing on certain PDFs.  I also
> > noticed the bug where no content was extracted after upgrading Tika.
> >
> > When I went to the SOLR issue you link to below, I applied all the
> patches,
> > downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
> and
> > got the following error:
> > SEVERE: java.lang.NoSuchMethodError:
> >
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> > at
> >
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> > at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> > at
> >
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> > at
> >
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> > at
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > This is really weird because I DID apply the SolrResourceLoader patch
> that
> > adds the getClassLoader method.  I even verified by going opening up the
> > JARs and looking at the class file in Eclipse...I can see the
> > SolrResourceLoader.getClassLoader() method.
> >
> > Does anyone know why it can't find the method?  After patching the source
> I
> > did ant clean dist in the base directory of the Solr source tree and
> > everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> > the jars from dist/ and all the library dependencies from
> > contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
> in
> > the logs looked good.
> >
> > I'm stumped.  It would be very nice to have a Solr implementation using
> the
> > newest versions of PDFBox & Tika and actually have content being
> > extracted...=)
> >
> > Best,
> > Dave
> >
> >
> > -Original Message-
> > From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> > Sent: Tuesday, July 27, 2010 6:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
> Solr
> > CELL/Tika/PDFBox
> >
> > Hi Jon,
> > During the last days we front the same problem.
> > Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> > content and from others, Solr throws an exception during the Indexing
> > Process .
> > You must:
> > Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> > snapshot and tika-parsers 0.8.
> > Update PdfBox and all related libraries.
> > After that You have to patch Solr 1.4.1 following this patch :
> >
> >
> https://issues.apache.org/jira/browse/SOLR-19

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

Yesterday I did get this working with version 4.0 from trunk.  I haven't fully 
tested it yet, but the content doesn't come through blank anymore, so that's 
good.  Would it be more stable to stick with 1.4.1 and your patch to get to 
Tika 0.8, or to stick with the 4.0 trunk version?

Best,
Dave

-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault 

> Alessandro & all,
>
> I was having the same issue with Tika crashing on certain PDFs.  I also
> noticed the bug where no content was extracted after upgrading Tika.
>
> When I went to the SOLR issue you link to below, I applied all the patches,
> downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
> got the following error:
> SEVERE: java.lang.NoSuchMethodError:
> org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
> at
> org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
> at java.lang.Thread.run(Thread.java:619)
>
> This is really weird because I DID apply the SolrResourceLoader patch that
> adds the getClassLoader method.  I even verified by going opening up the
> JARs and looking at the class file in Eclipse...I can see the
> SolrResourceLoader.getClassLoader() method.
>
> Does anyone know why it can't find the method?  After patching the source I
> did ant clean dist in the base directory of the Solr source tree and
> everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
> the jars from dist/ and all the library dependencies from
> contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
> the logs looked good.
>
> I'm stumped.  It would be very nice to have a Solr implementation using the
> newest versions of PDFBox & Tika and actually have content being
> extracted...=)
>
> Best,
> Dave
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 27, 2010 6:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
> CELL/Tika/PDFBox
>
> Hi Jon,
> During the last days we front the same problem.
> Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
> content and from others, Solr throws an exception during the Indexing
> Process .
> You must:
> Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
> snapshot and tika-parsers 0.8.
> Update PdfBox and all related libraries.
> After that You have to patch Solr 1.4.1 following this patch :
>
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> This is the firts way to solve the problem.
>
> Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
> is
> thrown during the Indexing process, but no content is extracted.
> Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
> sounds good but we don't know ho

Re: SolrJ Response + JSON


Thank you Markus, Mark.

Seems to be a problem with Nabble, not with the mailing list. Sorry.

I can create a JSON-response, when I query Solr directly.
But I mean, that I query Solr through a SolrJ-client 
(CommonsHttpSolrServer).
That means my queries look a litte bit like that: 
http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr

So the response is shown as an QueryResponse-object, not as a JSON-string.

Or do I miss something here?

Am 28.07.2010 15:15, schrieb Markus Jelsma:

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: SolrJ Response + JSON

2010-07-28 Thread Markus Jelsma

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the 
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get 
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
> Hello ,
> 
> Second try to send a mail to the mailing list...
> 
> I need to translate SolrJ's response into JSON-response.
> I can not query Solr directly, because I need to do some math with the
> responsed data, before I show the results to the client.
> 
> Any experiences how to translate SolrJ's response into JSON without writing
> your own JSON Writer?
> 
> Thank you.
> - Mitch
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan



On 28 Jul 2010, at 2:08 pm, MitchK wrote:

Second try to send a mail to the mailing list...


Your first attempt got through as well.  Here's my original response.


I think you should just be able to add &wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

SolrJ Response + JSON


Hello , 

Second try to send a mail to the mailing list... 

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you. 
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002115p1002115.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan

I think you should just be able to add &wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Show elevated Result Differently

2010-07-28 Thread Vishal.Arora


I want to show elevated Result Different from others is there any way to do
this 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: clustering component

2010-07-28 Thread Stanislaw Osinski

> The patch should also work with trunk, but I haven't verified it yet.
>

I've just added a patch against solr trunk to
https://issues.apache.org/jira/browse/SOLR-1804.

S.

Highlighted match snippets highlight non-matched words (such as 0.1 and 0.2)

2010-07-28 Thread Jon Cram

Hi,

 

I'm observing some strange highlighted words in field value snippets
returned from Solr when matched term highlighting
(http://wiki.apache.org/solr/HighlightingParameters) is enabled.

 

In some cases, highlighted field value snippets contain highlighted
words that are not matches:

-  this appears to be in addition to highlighting words that are
matches

-  these non-match highlighted words are not pre-highlighted in
the indexed content

-  I've determined these are non-matches by appending
debugQuery=1 to the URL and examining the match detail information

 

I've so far observed this in relation to the strings "0", "0.1", "0.2"
and "0.4" in indexed content.

 

Real life example when searching for [gas]:

 

Relevant matched document result from Solr:





EXAMPLE prepares an extensive range of traceable calibration gas
standards with guaranteed relative uncertainties levels of 0.1% for
certain species (PDF 676 KB).





 

Related highlighted snippet:







EXAMPLE prepares an extensive range of traceable calibration
gas standards with guaranteed relative uncertainties levels of
0.1% for certain species (PDF 676 KB).







 

Note how the highlight snippet correctly highlights "gas" and
incorrectly highlights "0.1". I've observed similar results for other
searches where indexed content contains "0", "0.1", "0.2" and "0.4" and
where these numbers are highlighted incorrectly.

 

At this stage I'm trying to determine if this due to a poor
implementation on my behalf or whether this is a bug in Solr.

 

I'd really like to know if:

 

1.   Anyone else has observed this behaviour

2.   If this might be a known issue with Solr (I've tried to find
out but haven't had any luck)

3.   Anyone can test using something like
http:///select?hl=true&hl.fl=*&q=(phrase+that+contains+0.1+in+resp
onse)&hl.fragsize=0
 

 

Thanks,

Jon Cram

Get unique values

2010-07-28 Thread Rafal Bluszcz Zawadzki

Hi,

In my schema I have (inter ali) fields CollectionID, and CollectionName.
 These two values always match together, which means that for every value of
CollectionID there is matching value from CollectionName.

I am interested in query which allow me to get unique values of CollectionID
with matching CollectionNames (rest of fields is not interested for me in
this query).

I was thinking about facets, but they offer a bit more than I need.

Anyone has idea for query which allow me to get these results?

Cheers,

-- 
Rafał Zawadzki
http://dev.bluszcz.net

SolrJ Response + JSON


Hello community,

I need to transform SolrJ - responses into JSON, after some computing on
those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange search

2010-07-28 Thread stockii


try to delete "solr.SnowballPorterFilterFactory" from your analyzerchain. i
had similar problems by using german  SnowballPorterFilterFactory
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1001990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr using 1500 threads - is that normal?

2010-07-28 Thread Christos Constantinou

Hi,

Solr seems to be crashing after a JVM exception that new threads cannot be 
created. I am writing in hope of advice from someone that has experienced this 
before. The exception that is causing the problem is:

Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to create 
new native thread

The memory that is allocated to Solr is 3072MB, which should be enough memory 
for a ~6GB data set. The documents are not big either, they have around 10 
fields of which only one stores large text ranging between 1k-50k.

The top command at the time of the crash shows Solr using around 1500 threads, 
which I assume it is not normal. Could it be that the threads are crashing one 
by one and new ones are created to cope with the queries?

In the log file, right after the the exception, there are several thousand 
commits before the server stalls completely. Normally, the log file would 
report 20-30 document existence queries per second, then 1 commit per 5-30 
seconds, and some more infrequent faceted document searches on the data. 
However after the exception, there are only commits until the end of the log 
file.

I am wondering if anyone has experienced this before or if it is some sort of 
known bug from Solr 1.4? Is there a way to increase the details of the 
exception in the logfile?

I am attaching the output of a grep Exception command on the logfile.

Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.commo

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

Hi Lance!

On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote:
> Should this go into the trunk, or does it only solve problems unique
> to your use case?

The solution is generic but is an extension of XPathEntityProcessor
because I didn't want to touch the solr.war. This way I can deploy the
extension into SOLR_HOME/lib.
The problem that it solves is not one with XPathEntityProcessor but more
general. What it does:

It adds an attribute to the entity that I called "skipIfEmpty" which
takes the variable (it could even take more variables seperated by
whitespace).
On entityProcessor.init() which is called for sub-entities per row of
root entity (:= before every new request to the data source), the value
of the attribute is resolved and if it is null or empty (after
trimming), the entity is not further processed.
This attribute is only allowed on sub-entities.

It would probably be nicer to put that somewhere higher up in the class
hierarchy so that all entity processors could make use of it.
But I don't know how common the use case is - all examples I found where
more or less "joins" on primary keys.

Cheers,
Chantal

Here comes the code==

import static
org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE;

import java.util.Map;
import java.util.logging.Logger;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImportHandlerException;
import org.apache.solr.handler.dataimport.XPathEntityProcessor;

public class OptionalXPathEntityProcessor extends XPathEntityProcessor {
private Logger log =
Logger.getLogger(OptionalXPathEntityProcessor.class.getName());
private static final String SKIP_IF_EMPTY = "skipIfEmpty";
private boolean skip = false;

@Override
protected void firstInit(Context context) {
if (context.isRootEntity()) {
throw new DataImportHandlerException(SEVERE,
"OptionalXPathEntityProcessor not allowed for root entities.");
}
super.firstInit(context);
}

@Override
public void init(Context context) {
String value = 
context.getResolvedEntityAttribute(SKIP_IF_EMPTY);
if (value == null || value.trim().isEmpty()) {
skip = true;
} else {
super.init(context);
skip = false;
}
}

@Override
public Map nextRow() {
if (skip) return null;
return super.nextRow();
}
}

Re: Indexing Problem: Where's my data?