RE: Updating Solr index from XML files

2009-07-08 Thread Francis Yakin
 Otis,

What is the difference or advantage if using solr.pm?

http://search.cpan.org/~garafola/Solr-0.03/lib/Solr.pm

Thanks

Francis


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Tuesday, July 07, 2009 10:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Updating Solr index from XML files


If Perl is you choice:
http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 1:16:04 AM
 Subject: Updating Solr index from XML files


 I have the following curl cmd to update and doing commit to Solr ( I have 10
 xml files just for testing)

 curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @commit.txt -H
 'Content-type:text/plain; charset=utf-8'

 It works so far. But I will have  3 xml files.

 What's the efficient way to do these things? I can script it with for loop 
 using
 regular shell script or perl.

 I am also looking into solr.pm from this:

 http://wiki.apache.org/solr/IntegratingSolr

 BTW: We are using weblogic to deploy the solr.war and by default solr in
 weblogic using port 7001, but not 8983.

 Thanks

 Francis



Re: Updating Solr index from XML files

2009-07-08 Thread Fergus McMenemie
If Perl is you choice:
http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm

h. Very interesting; I had not seen this!


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 1:16:04 AM
 Subject: Updating Solr index from XML files
 
 
 I have the following curl cmd to update and doing commit to Solr ( I have 
 10 
 xml files just for testing)
 
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @commit.txt -H 
 'Content-type:text/plain; charset=utf-8'
 
 It works so far. But I will have  3 xml files.
 
 What's the efficient way to do these things? I can script it with for loop 
 using 
 regular shell script or perl.

Assuming Solr1.4 or a nightly build. I would use DIH for this:-

  If all the files to be added/updated are in a directory. Then the
  FileListEntityProcessor could be used to find and index the files.
  It walks the disk from a given starting point.

  If you have another file, listing the files to be indexed, then
  I would use LineEntityProcessor to process that list.

One or other of the above would locate file to be indexed and 
would pass the filename to XPathEntityProcessor with useSolrAddSchema
set to true.
  
  See http://wiki.apache.org/solr/DataImportHandler

 
 I am also looking into solr.pm from this:
 
 http://wiki.apache.org/solr/IntegratingSolr
 
 BTW: We are using weblogic to deploy the solr.war and by default solr in 
 weblogic using port 7001, but not 8983.
 
 Thanks
 
 Francis

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Preparing the ground for a real multilang index

2009-07-08 Thread Michael Lackhoff
On 08.07.2009 00:50 Jan Høydahl wrote:

 itself and do not need to know the query language. You may then want  
 to do a copyfield from all your text_lang - text for convenient one- 
 field-to-rule-them-all search.

Would that really help? As I understand it, copyfield takes the raw, not
yet analyzed field value. I cannot see yet the advantage of this
text-field over the current situation with no text_lang fields at all.
The copied-to text field has to be language agnostic with no stemming at
all, so it would miss many hits. Or is there a way to combine many
differently stemmed variants into one field to be able to search against
all of them at once? That would be great indeed!

-Michael


Re: Can't limit return fields in custom request handler

2009-07-08 Thread Osman İZBAT
II'll look SolrPluginUtils.setReturnFields.

I'm running same query :
http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id
I get none empty result when  filter parameter is null, but when i pass
inStores filter parameter to getDocListAndSet i get empty result.

SolrParams solrParams = req.getParams();
Query q = QueryParsing.parseQuery(solrParams.get(q),
req.getSchema());
Query filter = new TermQuery(new Term(inStores, true));
DocListAndSet results = req.getSearcher().getDocListAndSet(q,
(Query)filter, (Sort)null, solrParams.getInt(start),
solrParams.getInt(limit));

Thanks.


On Tue, Jul 7, 2009 at 11:45 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : But I have a problem like this;  when i call
 :
 http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id
 ,
 : itemTitle
 : i'm getiing all fields instead of only id and itemTitle.

 Your custom handler is responsible for checking the fl and setting what
 you want the response fields to be on the response object.

 SolrPluginUtils.setReturnFields can be used if you want this to be done in
 the normal way.

 : Also i'm gettting no result when i give none null filter parameter in
 : getDocListAndSet(...).
...
 : DocListAndSet results = req.getSearcher().getDocListAndSet(q,
 : (Query)null, (Sort)null, solrParams.getInt(start),
 : solrParams.getInt(limit));

 ...that should work.  What does your query look like?  what are you
 passing for the start and limit params (is it possible you are getting
 results, but limit=0 so there aren't any results on the current page of
 pagination?) what does the debug output look like?


 -Hoss




-- 
Osman İZBAT


RE: Browse indexed terms in a field

2009-07-08 Thread Pierre-Yves LANDRON

Thanks !

it seems that can do the trick...

 Date: Tue, 7 Jul 2009 11:10:15 -0400
 Subject: Re: Browse indexed terms in a field
 From: bill.w...@gmail.com
 To: solr-user@lucene.apache.org
 
 You can use facet.perfix to match the beginning of a given word:
 
 http://wiki.apache.org/solr/SimpleFacetParameters#head-579914ef3a14d775a5ac64d2c17a53f3364e3cf6
 
 Bill
 
 On Tue, Jul 7, 2009 at 11:02 AM, Pierre-Yves LANDRON
 pland...@hotmail.comwrote:
 
 
  Hello,
 
  Here is what I would like to achieve : in an indexed document there's a
  fulltext indexed field ; I'd like to browse the terms in this field, ie. get
  all the terms that match the begining of a given word, for example.
  I can get all the field's facets for this document, but that's a lot of
  terms to process ; is there a way to constraint the returned facets ?
 
  Thank you for your highlights.
  Kind regards,
  Pierre.
 
  _
  More than messages–check out the rest of the Windows Live™.
  http://www.microsoft.com/windows/windowslive/

_
Windows Live™: Keep your life in sync. Check it out!
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t1_allup_explore_012009

Change in DocListAndSetNC not messing everything

2009-07-08 Thread Marc Sturlese

Hey there,
I had to implement something similar to field collapsing but could't use the
patch as it decreases a lot performance with index of about 4 Gig.
For testing, what I have done is do some hacks to SolrIndexSearcher's
getDocListAndSetNC funcion. I fill the ids array in my own order or I just
don't add some docs id's (and so change this id's array size). I have been
testing it and the performance is dramatically better that using the patch.
Can anyone tell me witch is the best way to hack DocListAndSetNC? I mean, I
know this change can make me go mad in the future, when I decide to update
trunk version or update to new releases.
My hack provably is too specific for my use case but could upload the source
in case someone can advice me what to do.
Thanks in advance,

-- 
View this message in context: 
http://www.nabble.com/Change-in-DocListAndSetNC-not-messing-everything-tp24387830p24387830.html
Sent from the Solr - User mailing list archive at Nabble.com.



how to do the distributed search with sort using solr?

2009-07-08 Thread shb
In my project, I am trying to do a distributed search sorted by some field
using solr. The test code is
as follows:

SolrQuery query = new SolrQuery();
query.set(q, id:[1 TO *]);
query.setSortField(id,SolrQuery.ORDER.asc);
query.setParam(shards, localhost:8983/solr, localhost:7574/solr);
QueryResponse response = server.query(query);

I get the following error. It seems that solr doesn't support the sort
function while doing the
distributed search.  Do you have any suggestions to solve this problem,
thanks!

org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
... 3 more


Re: how to do the distributed search with sort using solr?

2009-07-08 Thread shb
Sorry, the error is as follows. I have read the solr wiki carefully and
google it, but I haven't founded
any related question or solution,  any one can help me, thanks!

org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)
Caused by: org.apache.solr.common.SolrException:


Re: Updating Solr index from XML files

2009-07-08 Thread Norberto Meijome
On Tue, 7 Jul 2009 22:16:04 -0700
Francis Yakin fya...@liquid.com wrote:

 
 I have the following curl cmd to update and doing commit to Solr ( I have
 10 xml files just for testing)

[...]

hello,
DIH supports XML, right? 

not sure if it works with n files...but it's worth looking at it. 
alternatively, u can write a relatively simple java app that will pick each 
file up and post it for you using SolrJ
b

_
{Beto|Norberto|Numard} Meijome

Mix a little foolishness with your serious plans;
it's lovely to be silly at the right moment.
   Horace

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Adding new Fields ?

2009-07-08 Thread Saeli Mathieu
Hello.

I posted recently in this ML a script to transform any xml files in Solr's
xml files.
Anyway.
I've got a problem when I want to index my file, the indexation script from
the demonstration works perfectly, but now the only problem is, I can make
any research on this document.

I added


field name=lomgeneralidentifier type=text  indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
 and
 copyfield source=lomgeneralidentifierentry dest=text /

In schema.xml file.


Did I forgot something ?

-- 
Saeli Mathieu.


All in one index, or multiple indexes?

2009-07-08 Thread Tim Sell
Hi,
I am wondering if it is common to have just one very large index, or
multiple smaller indexes specialized for different content types.

We currently have multiple smaller indexes, although one of them is
much larger then the others. We are considering merging them, to allow
the convenience of searching across multiple types at once and get
them back in one list. The largest of the current indexes has a couple
of types that belong together, it has just one text field, and it is
usually quite short and is similar to product names (words like The
matter). Another index I would merge with this one, has multiple text
fields (also quite short).

We of course would still like to be able to get specific types. Is
doing filtering on just one type a big performance hit compared to
just querying it from it's own index? Bare in mind all these indexes
run on the same machine. (we replicate them all to three machines and
do load balancing).

There are a number of considerations. From an application standpoint
when querying across all types we may split the results out into the
separate types anyway once we have the list back. If we always do
this, is it silly to have them in one index, rather then query
multiple indexes at once? Is multiple http requests less significant
then the time to post split the results?

In some ways it is easier to maintain a single index, although it has
felt easier to optimize the results for the type of content if they
are in separate indexes. My main concern of putting it all in one
index is that we'll make it harder to work with. We will definitely
want to do filtering on types sometimes, and if we go with a mashed up
index I'd prefer not to maintain separate specialized indexes as well.

Any thoughts?

~Tim.


Re: how to do the distributed search with sort using solr?

2009-07-08 Thread Mark Miller
On Wed, Jul 8, 2009 at 6:45 AM, shb suh...@gmail.com wrote:

 Sorry, the error is as follows. I have read the solr wiki carefully and
 google it, but I haven't founded
 any related question or solution,  any one can help me, thanks!

 org.apache.solr.client.solrj.SolrServerException: Error executing query
at

 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)
 Caused by: org.apache.solr.common.SolrException:



java.net.ConnectException: Connection refused
   at
org.apache.solr.client.solrj.
impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)

Are you sure both servers are running properly? You can hit them
individually?


-- 
-- 
- Mark

http://www.lucidimagination.com


Re: Updating Solr index from XML files

2009-07-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jul 8, 2009 at 4:19 PM, Norberto Meijomenumard...@gmail.com wrote:
 On Tue, 7 Jul 2009 22:16:04 -0700
 Francis Yakin fya...@liquid.com wrote:


 I have the following curl cmd to update and doing commit to Solr ( I have
 10 xml files just for testing)

 [...]

 hello,
 DIH supports XML, right?
yes.
it supports multiple files too . (use FileListEntityProcessor)

 not sure if it works with n files...but it's worth looking at it. 
 alternatively, u can write a relatively simple java app that will pick each 
 file up and post it for you using SolrJ
 b

 _
 {Beto|Norberto|Numard} Meijome

 Mix a little foolishness with your serious plans;
 it's lovely to be silly at the right moment.
   Horace

 I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Updating Solr index from XML files

2009-07-08 Thread Erik Hatcher


On Jul 8, 2009, at 6:49 AM, Norberto Meijome wrote:
alternatively, u can write a relatively simple java app that will  
pick each file up and post it for you using SolrJ


Note that Solr ships with post.jar.  So one could post a bunch of Solr  
XML file like this:


java -jar post.jar *.xml

  Erik




Re: facets and stopwords

2009-07-08 Thread JCodina



hossman wrote:
 
 
 but are you sure that example would actually cause a problem?
 i suspect if you index thta exact sentence as is you wouldn't see the 
 facet count for si or que increase at all.
 
 If you do a query for {!raw field=content}que you bypass the query 
 parsers (which is respecting your stopwords file) and see all docs that 
 contain the raw term que in the content field.
 
 if you look at some of the docs that match, and paste their content field 
 into the analysis tool, i think you'll see that the problem comes from 
 using the whitespace tokenizer, and is masked by using the WDF 
 after the stop filter ... things like Que? are getting ignored by the 
 stopfilter, but ultimately winding up in your index as que
 
 
 -Hoss
 
 

Yes your are right, que? que, que... i need to change the analyzer. They are
not detected by the stopwords analyzer because i use the whitespace
tokenizer, I will use the StanadardTokenizer

Thanks Hoss

-- 
View this message in context: 
http://www.nabble.com/facets-and-stopwords-tp23952823p24390157.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adding new Fields ?

2009-07-08 Thread Erik Hatcher


On Jul 8, 2009, at 7:06 AM, Saeli Mathieu wrote:


Hello.

I posted recently in this ML a script to transform any xml files in  
Solr's

xml files.
Anyway.
I've got a problem when I want to index my file, the indexation  
script from
the demonstration works perfectly, but now the only problem is, I  
can make

any research on this document.

I added


field name=lomgeneralidentifier type=text  indexed=true  
stored=true

multiValued=true omitNorms=true termVectors=true /
and
copyfield source=lomgeneralidentifierentry dest=text /

In schema.xml file.


Did I forgot something ?


your field name is not the same as your copyfield source (note the  
entry on the source attribute)


Erik



Re: how to do the distributed search with sort using solr?

2009-07-08 Thread shb
java.net.ConnectException: Connection refused
  at
org.apache.solr.client.solrj.
impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)

The Connection refused error is caused becaused that the servers have been
stopped.

IAre you sure both servers are running properly? You can hit them
 individually?

I start both servers and if I comment out query.setParam(shards,
localhost:8983/solr, localhost:7574/solr);   or
query.setSortField(id,SolrQuery.ORDER.asc);, it will both
work correctly.

However, if I keep them both in the program, I got the error as follows:

  org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)


Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu
Yep I know that, I almost add more than 60 lines in this file :)
It's just an example.

Do you have any idea why when I'm trying to search something, the result of
Solr is equal to 0 ?

I'm looking forward to read you.

-- 
Saeli Mathieu.


Re: Question regarding ExtractingRequestHandler

2009-07-08 Thread Grant Ingersoll
For metadata, you can add the ext.metadata.prefix field and then use a  
dynamic field that maps that prefix, such as:


ext.metadata.prefix=metadata_

 dynamicField name=metadata_*  type=stringindexed=true   
stored=true/



Note, some of this is currently under review to be changed.  See 
https://issues.apache.org/jira/browse/SOLR-284

-Grant

On Jul 7, 2009, at 10:49 AM, ahammad wrote:



Hello,

I've recently started using this handler to index MS Word and PDF  
files.

When I set ext.extract.only=true, I get back all the metadata that is
associated with that file.

If I want to index, I need to set ext.extract.only=false. If I want  
to index
all that metadata along with the contents, what inputs do I need to  
pass to
the http request? Do I have to specifically define all the fields in  
the

schema or can Solr dynamically generate those fields?

Thanks.
--
View this message in context: 
http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Placing a CSV file into SOLR Server

2009-07-08 Thread Anand Kumar Prabhakar

Is there any way to Place the CSV file to index in the SOLR Server so that
the file can be indexed and searched. If so please let me know the location
in which we have to place the file. We are looking for a workaround to avoid
the HTTP request to the SOLR server as it is taking much time.
-- 
View this message in context: 
http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adding new Fields ?

2009-07-08 Thread Erik Hatcher


On Jul 8, 2009, at 8:10 AM, Saeli Mathieu wrote:


Yep I know that, I almost add more than 60 lines in this file :)
It's just an example.

Do you have any idea why when I'm trying to search something, the  
result of

Solr is equal to 0 ?


The first place I start with a general question like is add  
debugQuery=true and see what the query expression is parsed to, then  
go from there to find out if that is the actually intended query  
(proper fields being used, etc) and then back into the analysis  
process and the data that was indexed.  analysis.jsp comes in real  
handy troubleshooting these things.


Erik



Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Yonik Seeley
from: http://wiki.apache.org/solr/UpdateCSV

The following request will cause Solr to directly read the input file:

curl 
http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csvstream.contentType=text/plain;charset=utf-8
#NOTE: The full path, or a path relative to the CWD of the running
solr server must be used.


So you can put it anywhere local and give solr the full path to
directly read it.

-Yonik
http://www.lucidimagination.com



On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar
Prabhakaranand2...@gmail.com wrote:

 Is there any way to Place the CSV file to index in the SOLR Server so that
 the file can be indexed and searched. If so please let me know the location
 in which we have to place the file. We are looking for a workaround to avoid
 the HTTP request to the SOLR server as it is taking much time.
 --
 View this message in context: 
 http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu
The research debug is bit wired...

I'll give you a typical example.

I want to find, this word Cycle

the field in my xml file is this one

add
doc
..
field name=lomclassificationtaxonPathtaxonentrystringCycle 2/field

/doc
/add

This field is refered in my schema.xml by this way.


fields
field name=lomclassificationtaxonPathtaxonentrystring type=text
indexed=true stored=true multiValued=true omitNorms=true
termVectors=true /
/fields

and
 copyfield source=lomclassificationtaxonPathtaxonentrystring
dest=text/

Here is my research in debug mode with this request
http://localhost:8983/solr/select?indent=onversion=2.2q=Cyclestart=0rows=10fl=*%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl.fl=



-response
-lst name=responseHeader
int name=status0/int
int name=QTime0/int
-lst name=params
str name=explainOther/
str name=fl*,score/str
str name=debugQueryon/str
str name=indenton/str
str name=start0/str
str name=qCycle/str
str name=hl.fl/
str name=qtstandard/str
str name=wtstandard/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0 maxScore=0.0/
-lst name=debug
str name=rawquerystringCycle/str
str name=querystringCycle/str
str name=parsedquerytext:cycl/str
str name=parsedquery_toStringtext:cycl/str
lst name=explain/
str name=QParserOldLuceneQParser/str
-lst name=timing
double name=time0.0/double
-lst name=prepare
double name=time0.0/double
-lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
-lst name=process
double name=time0.0/double
-lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response


I don't know what I'm missing :/

Because I think I add all the necessary information in schema.xml.

-- 
Saeli Mathieu.


Re: Preparing the ground for a real multilang index

2009-07-08 Thread Paul Libbrecht

Can't the copy field use a different analyzer?
Both for query and indexing?
Otherwise you need to craft your own analyzer which reads the language  
from the field-name... there's several classes ready for this.


paul

Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit :


On 08.07.2009 00:50 Jan Høydahl wrote:


itself and do not need to know the query language. You may then want
to do a copyfield from all your text_lang - text for convenient  
one-

field-to-rule-them-all search.


Would that really help? As I understand it, copyfield takes the raw,  
not

yet analyzed field value. I cannot see yet the advantage of this
text-field over the current situation with no text_lang fields  
at all.
The copied-to text field has to be language agnostic with no  
stemming at

all, so it would miss many hits. Or is there a way to combine many
differently stemmed variants into one field to be able to search  
against

all of them at once? That would be great indeed!

-Michael




smime.p7s
Description: S/MIME cryptographic signature


Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Anand Kumar Prabhakar

Thank you for the input Yonik, anyway again we are sending an HTTP request to
the server, my requirement is to skip the HTTP request to the SOLR server.
Is there any way to avoid these HTTP requests?



Yonik Seeley-2 wrote:
 
 from: http://wiki.apache.org/solr/UpdateCSV
 
 The following request will cause Solr to directly read the input file:
 
 curl
 http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csvstream.contentType=text/plain;charset=utf-8
 #NOTE: The full path, or a path relative to the CWD of the running
 solr server must be used.
 
 
 So you can put it anywhere local and give solr the full path to
 directly read it.
 
 -Yonik
 http://www.lucidimagination.com
 
 
 
 On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar
 Prabhakaranand2...@gmail.com wrote:

 Is there any way to Place the CSV file to index in the SOLR Server so
 that
 the file can be indexed and searched. If so please let me know the
 location
 in which we have to place the file. We are looking for a workaround to
 avoid
 the HTTP request to the SOLR server as it is taking much time.
 --
 View this message in context:
 http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24391630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr's MLT query call doesn't work

2009-07-08 Thread SergeyG

Hi,

Recently, while implementing the MoreLikeThis search, I've run into the
situation when Solr's mlt query calls don't work. 

More specifically, the following query:

http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
5mlt.interestingTerms=detailsfl=title+author+score

brings back just the doc with id=10 and nothing else. While using the
GetMethod approach (putting /mlt explicitely into the url), I got back some
results.

I've been trying to solve this problem for more than a week with no luck. If
anybody has any hint, please help.

Below, I put logs  outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
GetMethod (/select).

Thanks a lot.

Regards,
Sergey Goldberg


Here're the logs: 

a) Solr (http://localhost:8080/solr/select)
08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt=
truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1
status=0 QTime=172

INFO MLTSearchRequestProcessor:49 - SolrServer url:
http://localhost:8080/solr
INFO MLTSearchRequestProcessor:67 - solrQuery
q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
5mlt.interestingTerms=detailsfl=title+author+score
INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612


b) GetMethod (http://localhost:8080/solr/mlt)
08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/mlt
params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max
qt=5mlt.interestingTerms=details} status=0 QTime=15

INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lstresult name=match numFound=1 start=0
maxScore=2.098612docfloat name=score2.098612/floatarr name=
authorstrS.G./str/arrstr
name=titleSG_Book/str/doc/resultresult name=response n
umFound=4 start=0 maxScore=0.28923997docfloat
name=score0.28923997/floatarr name=authorstrO.
Henry/strstrS.G./str/arrstr name=titleFour Million,
The/str/docdocfloat name=score0.08667877/floatarr
name=authorstrKatherine Mosby/str/arrstr name=titleThe Season
of Lillian Dawes/str/docdocfloat name=score0.07947738/floatarr
name=authorstrJerome K. Jerome/str/arrstr name=titleThree Men
in a Boat/str/docdocfloat 
name=score0.047219563/floatarr name=authorstrCharles
Oliver/strstrS.G./str/arrstr name=titleABC's of
Science/str/doc/resultlst name=interestingTermsfloat
name=content_mlt:ye1.0/floatfloat
name=content_mlt:tobin1.0/floatfloat
name=content_mlt:a1.0/floatfloat
name=content_mlt:i1.0/floatfloat
name=content_mlt:his1.0/float/lst
/response


c) GetMethod (http://localhost:8080/solr/select)
08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.
maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16

INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime16/intlst name=paramsstr name=fltitle author
score/strstr name=mlt.flcontent_mlt/strstr name=qid:10/strstr
name=mlt.maxqt5/strstr
name=mlt.interestingTermsdetails/str/lst/lstresult name=response
numFound=1 start=0 maxScore=2.098612docfloat
name=score2.098612/floatarr name=authorstrS.G./str/arrstr
name=titleSG_Book/str/doc/resultlst name=debugstr
name=rawquerystringid:10/strstr name=querystringid:10/strstr
name=parsedq
ueryid:10/strstr name=parsedquery_toStringid:10/strlst
name=explainstr name=10
2.098612 = (MATCH) weight(id:10 in 3), product of:
  0.9994 = queryWeight(id:10), product of:
2.0986123 = idf(docFreq=1, numDocs=5)
0.47650534 = queryNorm
  2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
1.0 = tf(termFreq(id:10)=1)
2.0986123 = idf(docFreq=1, numDocs=5)
1.0 = fieldNorm(field=id, doc=3)
/str/lststr name=QParserOldLuceneQParser/strlst
name=timingdouble name=time16.0/doublelst name=preparedouble
name=time0.0/doublelst
name=org.apache.solr.handler.component.QueryComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst name=org.apache.solr.handler.component
.MoreLikeThisComponentdouble name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time0.0/double/lst/lstlst name=processdouble
name=time16.0/doublelst
name=org.apache.solr.handler.component.QueryComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble 
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble

Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Yonik Seeley
On Wed, Jul 8, 2009 at 9:33 AM, Anand Kumar
Prabhakaranand2...@gmail.com wrote:
 Thank you for the input Yonik, anyway again we are sending an HTTP request to
 the server, my requirement is to skip the HTTP request to the SOLR server.
 Is there any way to avoid these HTTP requests?

You're sending a tiny HTTP request to the server that tells Solr to
directly read the big CSV file from disk... that should satisfy the
requirement which seemed to stem from the desire to avoid network
overhead, no?

-Yonik
http://www.lucidimagination.com


Re: Solr's MLT query call doesn't work

2009-07-08 Thread Yao Ge

A couple of things, your mlt.fl value, must be part of fl. In this case,
content_mlt is not included in fl.
I think the fl parameter value need to be comma separated. try
fl=title,author,content_mlt,score

-Yao

SergeyG wrote:
 
 Hi,
 
 Recently, while implementing the MoreLikeThis search, I've run into the
 situation when Solr's mlt query calls don't work. 
 
 More specifically, the following query:
 
 http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
 5mlt.interestingTerms=detailsfl=title+author+score
 
 brings back just the doc with id=10 and nothing else. While using the
 GetMethod approach (putting /mlt explicitely into the url), I got back
 some results.
 
 I've been trying to solve this problem for more than a week with no luck.
 If anybody has any hint, please help.
 
 Below, I put logs  outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
 GetMethod (/select).
 
 Thanks a lot.
 
 Regards,
 Sergey Goldberg
 
 
 Here're the logs: 
 
 a) Solr (http://localhost:8080/solr/select)
 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt=
 truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2}
 hits=1 status=0 QTime=172
 
 INFO MLTSearchRequestProcessor:49 - SolrServer url:
 http://localhost:8080/solr
 INFO MLTSearchRequestProcessor:67 - solrQuery
 q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
   5mlt.interestingTerms=detailsfl=title+author+score
 INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
 
 
 b) GetMethod (http://localhost:8080/solr/mlt)
 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/mlt
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max
 qt=5mlt.interestingTerms=details} status=0 QTime=15
 
 INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0
 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime0/int/lstresult name=match numFound=1 start=0
 maxScore=2.098612docfloat name=score2.098612/floatarr name=
 authorstrS.G./str/arrstr
 name=titleSG_Book/str/doc/resultresult name=response n
 umFound=4 start=0 maxScore=0.28923997docfloat
 name=score0.28923997/floatarr name=authorstrO.
 Henry/strstrS.G./str/arrstr name=titleFour Million,
 The/str/docdocfloat name=score0.08667877/floatarr
 name=authorstrKatherine Mosby/str/arrstr name=titleThe Season
 of Lillian Dawes/str/docdocfloat
 name=score0.07947738/floatarr name=authorstrJerome K.
 Jerome/str/arrstr name=titleThree Men in a
 Boat/str/docdocfloat 
 name=score0.047219563/floatarr name=authorstrCharles
 Oliver/strstrS.G./str/arrstr name=titleABC's of
 Science/str/doc/resultlst name=interestingTermsfloat
 name=content_mlt:ye1.0/floatfloat
 name=content_mlt:tobin1.0/floatfloat
 name=content_mlt:a1.0/floatfloat
 name=content_mlt:i1.0/floatfloat
 name=content_mlt:his1.0/float/lst
 /response
 
 
 c) GetMethod (http://localhost:8080/solr/select)
 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.
 maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16
 
 INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0
 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime16/intlst name=paramsstr name=fltitle author
 score/strstr name=mlt.flcontent_mlt/strstr
 name=qid:10/strstr name=mlt.maxqt5/strstr
 name=mlt.interestingTermsdetails/str/lst/lstresult
 name=response numFound=1 start=0 maxScore=2.098612docfloat
 name=score2.098612/floatarr name=authorstrS.G./str/arrstr
 name=titleSG_Book/str/doc/resultlst name=debugstr
 name=rawquerystringid:10/strstr name=querystringid:10/strstr
 name=parsedq
 ueryid:10/strstr name=parsedquery_toStringid:10/strlst
 name=explainstr name=10
 2.098612 = (MATCH) weight(id:10 in 3), product of:
   0.9994 = queryWeight(id:10), product of:
 2.0986123 = idf(docFreq=1, numDocs=5)
 0.47650534 = queryNorm
   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
 1.0 = tf(termFreq(id:10)=1)
 2.0986123 = idf(docFreq=1, numDocs=5)
 1.0 = fieldNorm(field=id, doc=3)
 /str/lststr name=QParserOldLuceneQParser/strlst
 name=timingdouble name=time16.0/doublelst name=preparedouble
 name=time0.0/doublelst
 name=org.apache.solr.handler.component.QueryComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst name=org.apache.solr.handler.component
 .MoreLikeThisComponentdouble name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time0.0/double/lst/lstlst name=processdouble
 name=time16.0/doublelst
 name=org.apache.solr.handler.component.QueryComponentdouble
 name=time0.0/double/lstlst
 

Re: about defaultSearchField

2009-07-08 Thread Yang Lin
Thanks for your reply. But it works not.

Yang

2009/7/8 Yao Ge yao...@gmail.com


 Try with fl=* or fl=*,score added to your request string.
 -Yao

 Yang Lin-2 wrote:
 
  Hi,
  I have some problems.
  For my solr progame, I want to type only the Query String and get all
  field
  result that includ the Query String. But now I can't get any result
  without
  specified field. For example, query with tina get nothing, but
  Sentence:tina could.
 
  I hava adjusted the *schema.xml* like this:
 
  fields
 field name=CategoryNamePolarity type=text indexed=true
  stored=true multiValued=true/
 field name=CategoryNameStrenth type=text indexed=true
  stored=true multiValued=true/
 field name=CategoryNameSubjectivity type=text indexed=true
  stored=true multiValued=true/
 field name=Sentence type=text indexed=true stored=true
  multiValued=true/
 
 field name=allText type=text indexed=true stored=true
  multiValued=true/
  /fields
 
  uniqueKey required=falseSentence/uniqueKey
 
   !-- field for the QueryParser to use when an explicit fieldname is
  absent
  --
   defaultSearchFieldallText/defaultSearchField
 
   !-- SolrQueryParser configuration: defaultOperator=AND|OR --
   solrQueryParser defaultOperator=OR/
 
  copyfield source=CategoryNamePolarity dest=allText/
  copyfield source=CategoryNameStrenth dest=allText/
  copyfield source=CategoryNameSubjectivity dest=allText/
  copyfield source=Sentence dest=allText/
 
 
  I think the problem is in defaultSearchField, but I don't know how to
  fix
  it. Could anyone help me?
 
  Thanks
  Yang
 
 

 --
 View this message in context:
 http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Using relevance scores for psuedo-random-probabilistic ordenation

2009-07-08 Thread Raimon Bosch


Hi,

I've just implemented my PseudoRandomFieldComparator (migrated from
PseudoRandomComparatorSource). The problem that I see is that I don't have
acces to the relevance's scores in the deprecated
PseudoRandomComparatorSource. I'm trying to fill the scores from my
PseudoRandomComponent (in the process() method).

I don't know if use a PseudoRandomComparator that extends from
QueryComponent and then repeat the query or sth similar like reorder my
doclist, or if use two diferent components QueryComponent and
PseudoComponent (extends from SearchComponent) and look for a good
combination.

How can I have my relevance scores on my PseudoRandomFieldComparator? Any
ideas?


Regards,
Raimon Bosch.
-- 
View this message in context: 
http://www.nabble.com/Using-relevance-scores-for-psuedo-random-probabilistic-ordenation-tp24392432p24392432.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu
I don't really know how to solve my problem :/

On Wed, Jul 8, 2009 at 3:16 PM, Saeli Mathieu saeli.math...@gmail.comwrote:

 The research debug is bit wired...

 I'll give you a typical example.

 I want to find, this word Cycle

 the field in my xml file is this one
 
 add
 doc
 ..
 field name=lomclassificationtaxonPathtaxonentrystringCycle 2/field
 
 /doc
 /add

 This field is refered in my schema.xml by this way.

 
 fields
 field name=lomclassificationtaxonPathtaxonentrystring type=text
 indexed=true stored=true multiValued=true omitNorms=true
 termVectors=true /
 /fields

 and
  copyfield source=lomclassificationtaxonPathtaxonentrystring
 dest=text/

 Here is my research in debug mode with this request
 http://localhost:8983/solr/select?indent=onversion=2.2q=Cyclestart=0rows=10fl=*%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl.fl=


 
 -response
 -lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 -lst name=params
 str name=explainOther/
 str name=fl*,score/str
 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=qCycle/str
 str name=hl.fl/
 str name=qtstandard/str
 str name=wtstandard/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 result name=response numFound=0 start=0 maxScore=0.0/
 -lst name=debug
 str name=rawquerystringCycle/str
 str name=querystringCycle/str
 str name=parsedquerytext:cycl/str
 str name=parsedquery_toStringtext:cycl/str
 lst name=explain/
 str name=QParserOldLuceneQParser/str
 -lst name=timing
 double name=time0.0/double
 -lst name=prepare
 double name=time0.0/double
 -lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 -lst name=process
 double name=time0.0/double
 -lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 -lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response
 

 I don't know what I'm missing :/

 Because I think I add all the necessary information in schema.xml.

 --
 Saeli Mathieu.




-- 
Saeli Mathieu.


Re: Adding new Fields ?

2009-07-08 Thread Jon Gorman
I think at least you need to review your import process.  If nothing
indexed, there's going to be nothing that matched.  We need a little
more information.  Stuff like a short but concise test sample of what
you're trying to index, how you're submitting the http request and the
commit request (you did commit, right?), what messages you're getting
when you do index and then commit.

I didn't look too closely at your last code example, but I would
recommend using some XML libraries.  If I remember it didn't.

Most folks seem to process xml files for indexing by using the source
xml files to create new files just for indexing.  There's an
identifier, which is usually used to link back to the source xml file
in the application you design.

Jon Gorman


Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Glen Newton
Try putting all the PDF URLs into a file, download with something like
'wget' then index locally.

Glen Newton
http://zzzoot.blogspot.com/

2009/7/8 ahammad ahmed.ham...@gmail.com:

 Hello,

 I can index rich documents like pdf for instance that are on the filesystem.
 Can we use ExtractingRequestHandler to index files that are accessible on a
 website?

 For example, there is a file that can be reached like so:
 http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf

 How would I go about indexing that file? I tried using the following
 combinations. I will put the errors in brackets:

 stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
 filename, directory name, or volume label syntax is incorrect)
 stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
 cannot find the path specified)
 stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of
 the specified network name is invalid)
 stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
 find the path specified)
 stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path
 was not found)

 I sort of understand why I get those errors. What are the alternative
 methods of doing this? I am guessing that the stream.file attribute doesn't
 support web addresses. Is there another attribute that does?
 --
 View this message in context: 
 http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 

-


Re: Adding new Fields ?

2009-07-08 Thread Erick Erickson
Have you thought about looking at your index with Luke to see ifwhat you
expect to be there is actually there?

Best
Erick

On Wed, Jul 8, 2009 at 11:28 AM, Jon Gorman jonathan.gor...@gmail.comwrote:

 I think at least you need to review your import process.  If nothing
 indexed, there's going to be nothing that matched.  We need a little
 more information.  Stuff like a short but concise test sample of what
 you're trying to index, how you're submitting the http request and the
 commit request (you did commit, right?), what messages you're getting
 when you do index and then commit.

 I didn't look too closely at your last code example, but I would
 recommend using some XML libraries.  If I remember it didn't.

 Most folks seem to process xml files for indexing by using the source
 xml files to create new files just for indexing.  There's an
 identifier, which is usually used to link back to the source xml file
 in the application you design.

 Jon Gorman



SolrException - Lock obtain timed out, no leftover locks

2009-07-08 Thread danben

Hi,

I'm running Solr 1.3.0 in multicore mode and feeding it data from which the
core name is inferred from a specific field.  My service extracts the core
name and, if it has not seen it before, issues a create request for that
core before attempting to add the document (via SolrJ).  I have a pool of
MyIndexers that run in parallel, taking documents from a queue and adding
them via the add method on the SolrServer instance corresponding to that
core (exactly one per core exists).  Each core is in a separate data
directory.  My timeouts are set as such:

writeLockTimeout15000/writeLockTimeout
commitLockTimeout25000/commitLockTimeout

I remove the index directories, start the server, check that no locks exist,
and generate ~500 documents spread across 5 cores for the MyIndexers to
handle.  Each time, I see one or more exceptions with a message like 

Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_...

When the indexers have completed, no lock is left over.  There is no
discernible pattern as far as when the exception occurs (ie, it does not
tend to happen on the first or last or any particular document).

Interestingly, this problem does not happen when I have only a single
MyIndexer, or if I have a pool of MyIndexers and am running in single core
mode.  

I've looked at the other posts from users getting this exception but it
always seemed to be a different case, such as the server having crashed
previously and a lock file being left over.

-- 
View this message in context: 
http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: about defaultSearchField

2009-07-08 Thread Jay Hill
Just to be sure: You mentioned that you adjusted schema.xml - did you
re-index after making your changes?

-Jay


On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote:

 Thanks for your reply. But it works not.

 Yang

 2009/7/8 Yao Ge yao...@gmail.com

 
  Try with fl=* or fl=*,score added to your request string.
  -Yao
 
  Yang Lin-2 wrote:
  
   Hi,
   I have some problems.
   For my solr progame, I want to type only the Query String and get all
   field
   result that includ the Query String. But now I can't get any result
   without
   specified field. For example, query with tina get nothing, but
   Sentence:tina could.
  
   I hava adjusted the *schema.xml* like this:
  
   fields
  field name=CategoryNamePolarity type=text indexed=true
   stored=true multiValued=true/
  field name=CategoryNameStrenth type=text indexed=true
   stored=true multiValued=true/
  field name=CategoryNameSubjectivity type=text indexed=true
   stored=true multiValued=true/
  field name=Sentence type=text indexed=true stored=true
   multiValued=true/
  
  field name=allText type=text indexed=true stored=true
   multiValued=true/
   /fields
  
   uniqueKey required=falseSentence/uniqueKey
  
!-- field for the QueryParser to use when an explicit fieldname is
   absent
   --
defaultSearchFieldallText/defaultSearchField
  
!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=OR/
  
   copyfield source=CategoryNamePolarity dest=allText/
   copyfield source=CategoryNameStrenth dest=allText/
   copyfield source=CategoryNameSubjectivity dest=allText/
   copyfield source=Sentence dest=allText/
  
  
   I think the problem is in defaultSearchField, but I don't know how to
   fix
   it. Could anyone help me?
  
   Thanks
   Yang
  
  
 
  --
  View this message in context:
  http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu
Here is my result when I'm adding a file to solr

{...@framboise.}:java -jar post.jar
FinalParsing.xml
[18:37]#25
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file FinalParsing.xml
SimplePostTool: COMMITting Solr index changes..
{...@framboise.}:


Here is my typical xml file.

add
doc
   field name=id0/field
field name=lomgeneralidentifiercatalogTEXT/field
field name=lomgeneralidentifierentryTEXT/field
field name=lomgeneraltitlestringTEXT/field
field name=lomgenerallanguageTEXT/field
field name=lomgeneraldescriptionstringTEXT/field
field name=lomlifeCyclestatussourceTEXT/field
field name=lomlifeCyclestatusvalueTEXT/field
field name=lomlifeCyclecontributerolesourceTEXT/field
field name=lomlifeCyclecontributerolevalueTEXT/field
field name=lomlifeCyclecontributeentityTEXT/field
field name=lommetaMetadataidentifiercatalogTEXT/field
field name=lommetaMetadataidentifierentryTEXT/field
field name=lommetaMetadatacontributerolesourceTEXT/field
field name=lommetaMetadatacontributerolevalueTEXT/field
field name=lommetaMetadatacontributeentityTEXT/field
field name=lommetaMetadatacontributedatedateTimeTEXT/field
field name=lommetaMetadatacontributerolesourceTEXT/field
field name=lommetaMetadatacontributerolevalueTEXT/field
field name=lommetaMetadatacontributeentityTEXT/field
field name=lommetaMetadatacontributeentityTEXT/field
field name=lommetaMetadatacontributeentityTEXT/field
field name=lommetaMetadatacontributedatedateTimeTEXT/field
field name=lommetaMetadatametadataSchemaTEXT/field
field name=lommetaMetadatalanguageTEXT/field
field name=lomtechnicallocationTEXT/field
field name=lomeducationalintendedEndUserRolesourceTEXT/field
field name=lomeducationalintendedEndUserRolevalueTEXT/field
field name=lomeducationalcontextsourceTEXT/field
field name=lomeducationalcontextvalueTEXT/field
field name=lomeducationaltypicalAgeRangestringTEXT/field
field name=lomeducationaltypicalAgeRangestringTEXT/field
field name=lomeducationaldescriptionstringTEXT/field
field name=lomeducationallanguageTEXT/field
field name=lomannotationentityTEXT/field
field name=lomannotationdatedateTimeTEXT/field
field name=lomannotationdescriptionstringTEXT/field
field name=lomclassificationpurposesourceTEXT/field
field name=lomclassificationpurposevalueTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
field name=lomclassificationpurposesourceTEXT/field
field name=lomclassificationpurposevalueTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
field name=lomclassificationpurposesourceTEXT/field
field name=lomclassificationpurposevalueTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
field name=lomclassificationtaxonPathsourcestringTEXT/field
field name=lomclassificationtaxonPathtaxonidTEXT/field
field
name=lomclassificationtaxonPathtaxonentrystringTEXT/field
/doc
/add

here is my schema.xml configuration.

fields
 field name=id type=string indexed=true stored=true required=true
/
  field name=sku type=textTight indexed=true stored=true
omitNorms=true/
  field name=name type=text indexed=true stored=true/
  field name=nameSort type=string indexed=true stored=false/
  field name=alphaNameSort type=alphaOnlySort indexed=true
stored=false/
  field name=manu type=text indexed=true stored=true
omitNorms=true/
  field name=cat type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text indexed=true stored=true
multiValued=true/
  field name=includes type=text indexed=true stored=true/
  field name=lomgeneral type=text  indexed=true 

expand synonyms without tokenizing stream?

2009-07-08 Thread Don Clore
I'm pretty new to solr; my apologies if this is a naive question, and my
apologies for the verbosity:
I'd like to take keywords in my documents, and expand them as synonyms; for
example, if the document gets annotated with a keyword of 'sf', I'd like
that to expand to 'San Francisco'.  (San Francisco,San Fran,SF is a line in
my synonyms.txt file).

But I also want to be able to display facets with counts for these keywords;
I'd like them to be suitable for display.

So, if I define the keywords field as 'text', I use the following pipeline
(from my schema.xml):

fieldType name=text class=solr.TextField
positionIncrementGap=100  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/filter
class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/filter
class=solr.StopFilterFactoryignoreCase=true
words=stopwords.txt
enablePositionIncrements=true/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer  analyzer type=querytokenizer
class=solr.WhitespaceTokenizerFactory/filter
class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/filter
class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer/fieldType


Faceting on this field, I get return values (when I query specifically
for the single document in question):

  lst name=Keywords
int name=fran1/int
int name=francisco1/int
int name=san1/int
int name=sf1/int
  /lst

I've also done a copyfield to a 'KeywordsString' field, which is
defined as string. i.e.

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

Faceting on *that* field (when querying for just this 1 document,
which has a keyword of 'sf'), results in:

  lst name=KeywordsString
int name=sf1/int
  /lst

I guess what I'd like to see is the ability to stamp keywords like
'sf', 'san fran', 'san francisco', and 'mlb' (with a synonyms.txt file
entry of mlb = Major League Baseball, and see all the documents that
are inscribed with all those synonym variants, come back as:

  lst name=KeywordsString
int name=San Francisco1/int

   int name=Major League Baseball1/int

/lst


But, I don't know how to define a processing pipeline that expands
synonyms that doesn't tokenize them, breaking 'San Francisco' into
'san' and 'francisco', and presenting those as separate facets.

Thanks for any help,

Don


Re: Multiple values for custom fields provided in SOLR query

2009-07-08 Thread Otis Gospodnetic

Suryasnat,

I suggest you go to your Solr Admin page and run a few searches from there, 
using Lucene query syntax (link on Lucene site).
e.g.
fieldID:111 AND fieldID:222 AND fieldID:333 AND foo:product

then eplace ANDs with ORs where appropriate

That should give you an idea/feel about which query you need.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Suryasnat Das suryaatw...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, July 7, 2009 12:16:30 PM
 Subject: Re: Multiple values for custom fields provided in SOLR query
 
 Hi Otis,
 
 Thanks for replying to my query.
 
 My query is, if multiple values are provided for a custom field then how can
 it be represented in a SOLR query. So if my field is fileID and its values
 are 111, 222 and 333 and my search string is ‘product’ then how can this be
 represented in a SOLR query? I want to perform the search on basis of
 fileIDs *and* search string provided.
 
 If i provide the query in the format,
 q=fileID:111+fileID:222+fileID:333+product, then how will it actually
 search? Can you please provide me the correct format of the query?
 
 Regards
 
 Suryasnat Das
 
 On Mon, Jul 6, 2009 at 10:05 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
 
  I actually don't fully understand your question.
  q=+fileID:111+fileID:222+fileID:333+apple looks like a valid query to me.
  (not sure what that space encoded as + is, though)
 
  Also not sure what you mean by:
   Basically the requirement is , if fileIDs are provided as search
  parameter
   then search should happen on the basis of fileID.
 
 
  Do you mean apple should be ignored if a term (field name:field value) is
  provided?
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
   From: Suryasnat Das 
   To: solr-user@lucene.apache.org
   Sent: Monday, July 6, 2009 11:31:10 AM
   Subject: Multiple values for custom fields provided in SOLR query
  
   Hi,
   I have a requirement in which i need to have multiple values in my custom
   fields while forming the search query to SOLR. For example,
   fileID is my custom field. I have defined the fileID in schema.xml as
   name=fileID type=string indexed=true stored=true required=true
   multiValued=true/.
   Now fileID can have multiple values like 111,222,333 etc. So will my
  query
   be of the form,
  
   q=+fileID:111+fileID:222+fileID:333+apple
  
   where apple is my search query string. I tried with the above query but
  it
   did not work. SOLR gave invalid query error.
   Basically the requirement is , if fileIDs are provided as search
  parameter
   then search should happen on the basis of fileID.
  
   Is my approach correct or i need to do something else? Please, if
  immediate
   help is provided then that would be great.
  
   Regards
   Suryasnat Das
   Infosys.
 
 



Re: Solr's MLT query call doesn't work

2009-07-08 Thread Otis Gospodnetic

Sergey,

What about 
http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt

?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: SergeyG sgoldb...@mail.ru
 To: solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 9:44:20 AM
 Subject: Solr's MLT query call doesn't work
 
 
 Hi,
 
 Recently, while implementing the MoreLikeThis search, I've run into the
 situation when Solr's mlt query calls don't work. 
 
 More specifically, the following query:
 
 http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
 5mlt.interestingTerms=detailsfl=title+author+score
 
 brings back just the doc with id=10 and nothing else. While using the
 GetMethod approach (putting /mlt explicitely into the url), I got back some
 results.
 
 I've been trying to solve this problem for more than a week with no luck. If
 anybody has any hint, please help.
 
 Below, I put logs  outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
 GetMethod (/select).
 
 Thanks a lot.
 
 Regards,
 Sergey Goldberg
 
 
 Here're the logs: 
 
 a) Solr (http://localhost:8080/solr/select)
 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt=
 truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1
 status=0 QTime=172
 
 INFO MLTSearchRequestProcessor:49 - SolrServer url:
 http://localhost:8080/solr
 INFO MLTSearchRequestProcessor:67 - solrQuery
 q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
 5mlt.interestingTerms=detailsfl=title+author+score
 INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
 
 
 b) GetMethod (http://localhost:8080/solr/mlt)
 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/mlt
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max
 qt=5mlt.interestingTerms=details} status=0 QTime=15
 
 INFO MLT2SearchRequestProcessor:76 - 
 
 0
 name=QTime0
 maxScore=2.0986122.098612S.G.
 name=titleSG_Book
 umFound=4 start=0 maxScore=0.28923997
 name=score0.28923997O.
 HenryS.G.Four Million,
 The0.08667877
 name=authorKatherine MosbyThe Season
 of Lillian Dawes0.07947738
 name=authorJerome K. JeromeThree Men
 in a Boat
 name=score0.047219563Charles
 OliverS.G.ABC's of
 Science
 name=content_mlt:ye1.0
 name=content_mlt:tobin1.0
 name=content_mlt:a1.0
 name=content_mlt:i1.0
 name=content_mlt:his1.0
 
 
 
 c) GetMethod (http://localhost:8080/solr/select)
 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.
 maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16
 
 INFO MLT2SearchRequestProcessor:80 - 
 
 0
 name=QTime16title author
 scorecontent_mltid:10
 name=mlt.maxqt5
 name=mlt.interestingTermsdetails
 numFound=1 start=0 maxScore=2.098612
 name=score2.098612S.G.
 name=titleSG_Book
 name=rawquerystringid:10id:10
 name=parsedq
 ueryid:10id:10
 name=explain
 2.098612 = (MATCH) weight(id:10 in 3), product of:
   0.9994 = queryWeight(id:10), product of:
 2.0986123 = idf(docFreq=1, numDocs=5)
 0.47650534 = queryNorm
   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
 1.0 = tf(termFreq(id:10)=1)
 2.0986123 = idf(docFreq=1, numDocs=5)
 1.0 = fieldNorm(field=id, doc=3)
 OldLuceneQParser
 name=timing16.0
 name=time0.0
 name=org.apache.solr.handler.component.QueryComponent
 name=time0.0
 name=org.apache.solr.handler.component.FacetComponent
 name=time0.00.0
 name=org.apache.solr.handler.component.HighlightComponent
 name=time0.0
 name=org.apache.solr.handler.component.DebugComponent
 name=time0.0
 name=time16.0
 name=org.apache.solr.handler.component.QueryComponent
 name=time0.0
 name=org.apache.solr.handler.component.FacetComponent
 name=time0.0
 name=org.apache.solr.handler.component.MoreLikeThisComponent
 name=time0.0
 name=org.apache.solr.handler.component.HighlightComponent
 name=time0.0
 name=org.apache.solr.handler.component.DebugComponent
 name=time16.0
 
 
 
 And here're the relevant entries from solrconfig.xml:
 
 
   
 
   explicit
   id,title,author,score
   on
 
 
 
 
 
   1
   10
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill
I haven't tried this myself, but it sounds like what you're looking for is
enabling remote streaming:
http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf

As the link above shows you should be able to enable remote streaming like
this: requestParsers enableRemoteStreaming=true
multipartUploadLimitInKB=2048 /  and then something like this might work:
stream.url=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdfhttp://www.sub.mydomain.com/files/pdfdocs/testfile.pdf

So you use stream.url instead of stream.file.

Hope this helps.

-Jay


On Wed, Jul 8, 2009 at 7:40 AM, ahammad ahmed.ham...@gmail.com wrote:


 Hello,

 I can index rich documents like pdf for instance that are on the
 filesystem.
 Can we use ExtractingRequestHandler to index files that are accessible on a
 website?

 For example, there is a file that can be reached like so:
 http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf

 How would I go about indexing that file? I tried using the following
 combinations. I will put the errors in brackets:

 stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
 filename, directory name, or volume label syntax is incorrect)
 stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
 cannot find the path specified)
 stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format
 of
 the specified network name is invalid)
 stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
 find the path specified)
 stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network
 path
 was not found)

 I sort of understand why I get those errors. What are the alternative
 methods of doing this? I am guessing that the stream.file attribute doesn't
 support web addresses. Is there another attribute that does?
 --
 View this message in context:
 http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr's MLT query call doesn't work

2009-07-08 Thread Bill Au
You definitely need mlt=true if you are not using /solr/mlt.

Bill

On Wed, Jul 8, 2009 at 2:14 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:


 Sergey,

 What about
 http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
 5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlthttp://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=%0A5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt

 ?

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: SergeyG sgoldb...@mail.ru
  To: solr-user@lucene.apache.org
  Sent: Wednesday, July 8, 2009 9:44:20 AM
  Subject: Solr's MLT query call doesn't work
 
 
  Hi,
 
  Recently, while implementing the MoreLikeThis search, I've run into the
  situation when Solr's mlt query calls don't work.
 
  More specifically, the following query:
 
 
 http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
  5mlt.interestingTerms=detailsfl=title+author+score
 
  brings back just the doc with id=10 and nothing else. While using the
  GetMethod approach (putting /mlt explicitely into the url), I got back
 some
  results.
 
  I've been trying to solve this problem for more than a week with no luck.
 If
  anybody has any hint, please help.
 
  Below, I put logs  outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
  GetMethod (/select).
 
  Thanks a lot.
 
  Regards,
  Sergey Goldberg
 
 
  Here're the logs:
 
  a) Solr (http://localhost:8080/solr/select)
  08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
  INFO: [] webapp=/solr path=/select
  params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt=
  truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2}
 hits=1
  status=0 QTime=172
 
  INFO MLTSearchRequestProcessor:49 - SolrServer url:
  http://localhost:8080/solr
  INFO MLTSearchRequestProcessor:67 - solrQuery
  q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
  5mlt.interestingTerms=detailsfl=title+author+score
  INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
  INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
 
 
  b) GetMethod (http://localhost:8080/solr/mlt)
  08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
  INFO: [] webapp=/solr path=/mlt
  params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max
  qt=5mlt.interestingTerms=details} status=0 QTime=15
 
  INFO MLT2SearchRequestProcessor:76 -
 
  0
  name=QTime0
  maxScore=2.0986122.098612S.G.
  name=titleSG_Book
  umFound=4 start=0 maxScore=0.28923997
  name=score0.28923997O.
  HenryS.G.Four Million,
  The0.08667877
  name=authorKatherine MosbyThe Season
  of Lillian Dawes0.07947738
  name=authorJerome K. JeromeThree Men
  in a Boat
  name=score0.047219563Charles
  OliverS.G.ABC's of
  Science
  name=content_mlt:ye1.0
  name=content_mlt:tobin1.0
  name=content_mlt:a1.0
  name=content_mlt:i1.0
  name=content_mlt:his1.0
 
 
 
  c) GetMethod (http://localhost:8080/solr/select)
  08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
  INFO: [] webapp=/solr path=/select
  params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.
  maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16
 
  INFO MLT2SearchRequestProcessor:80 -
 
  0
  name=QTime16title author
  scorecontent_mltid:10
  name=mlt.maxqt5
  name=mlt.interestingTermsdetails
  numFound=1 start=0 maxScore=2.098612
  name=score2.098612S.G.
  name=titleSG_Book
  name=rawquerystringid:10id:10
  name=parsedq
  ueryid:10id:10
  name=explain
  2.098612 = (MATCH) weight(id:10 in 3), product of:
0.9994 = queryWeight(id:10), product of:
  2.0986123 = idf(docFreq=1, numDocs=5)
  0.47650534 = queryNorm
2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
  1.0 = tf(termFreq(id:10)=1)
  2.0986123 = idf(docFreq=1, numDocs=5)
  1.0 = fieldNorm(field=id, doc=3)
  OldLuceneQParser
  name=timing16.0
  name=time0.0
  name=org.apache.solr.handler.component.QueryComponent
  name=time0.0
  name=org.apache.solr.handler.component.FacetComponent
  name=time0.00.0
  name=org.apache.solr.handler.component.HighlightComponent
  name=time0.0
  name=org.apache.solr.handler.component.DebugComponent
  name=time0.0
  name=time16.0
  name=org.apache.solr.handler.component.QueryComponent
  name=time0.0
  name=org.apache.solr.handler.component.FacetComponent
  name=time0.0
  name=org.apache.solr.handler.component.MoreLikeThisComponent
  name=time0.0
  name=org.apache.solr.handler.component.HighlightComponent
  name=time0.0
  name=org.apache.solr.handler.component.DebugComponent
  name=time16.0
 
 
 
  And here're the relevant entries from solrconfig.xml:
 
 
 
 
explicit
id,title,author,score
on
 
 
 
 
 
1
10
 
 
  --
  View this message in context:
 
 http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: about defaultSearchField

2009-07-08 Thread Yang Lin
Yes, I have deleted whole index directory and re-index after making
changes.

Yang


2009/7/8 Jay Hill jayallenh...@gmail.com

 Just to be sure: You mentioned that you adjusted schema.xml - did you
 re-index after making your changes?

 -Jay


 On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote:

  Thanks for your reply. But it works not.
 
  Yang
 
  2009/7/8 Yao Ge yao...@gmail.com
 
  
   Try with fl=* or fl=*,score added to your request string.
   -Yao
  
   Yang Lin-2 wrote:
   
Hi,
I have some problems.
For my solr progame, I want to type only the Query String and get all
field
result that includ the Query String. But now I can't get any result
without
specified field. For example, query with tina get nothing, but
Sentence:tina could.
   
I hava adjusted the *schema.xml* like this:
   
fields
   field name=CategoryNamePolarity type=text indexed=true
stored=true multiValued=true/
   field name=CategoryNameStrenth type=text indexed=true
stored=true multiValued=true/
   field name=CategoryNameSubjectivity type=text indexed=true
stored=true multiValued=true/
   field name=Sentence type=text indexed=true stored=true
multiValued=true/
   
   field name=allText type=text indexed=true stored=true
multiValued=true/
/fields
   
uniqueKey required=falseSentence/uniqueKey
   
 !-- field for the QueryParser to use when an explicit fieldname is
absent
--
 defaultSearchFieldallText/defaultSearchField
   
 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/
   
copyfield source=CategoryNamePolarity dest=allText/
copyfield source=CategoryNameStrenth dest=allText/
copyfield source=CategoryNameSubjectivity dest=allText/
copyfield source=Sentence dest=allText/
   
   
I think the problem is in defaultSearchField, but I don't know how
 to
fix
it. Could anyone help me?
   
Thanks
Yang
   
   
  
   --
   View this message in context:
  
 http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
 



Re: Solr's MLT query call doesn't work

2009-07-08 Thread SergeyG

Many thanks to everybody who replied to my message. 

1. A couple of things, your mlt.fl value, must be part of fl. In this case,
content_mlt is not included in fl.
I think the fl parameter value need to be comma separated. try
fl=title,author,content_mlt,score

Yao, 

Although I don't understand why mlt.fl must be part of fl (at least, I
didn't see this mentioned anywhere), I included this field into fl. But this
didn't change anything. As to the syntax, both
fl=title,author,content_mlt,score and fl=title author content_mlt score
produced the same output (what, again, was the exactly the same as the one
with fl=title author score).

2. You definitely need mlt=true if you are not using /solr/mlt.

Bill, 

mlt=true was included in the query while making the Solr call from the
very beginning.

3. What about 
http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=%0A5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt

Otis,

I tried that too and got this:

INFO MLTSearchRequestProcessor:69 - solrQuery
q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt
ERROR MLTSearchRequestProcessor:88 - Error executing query

INFO MLTSearchRequestProcessor:69 - solrQuery
q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=5mlt.interestingTerms=detailsfl=title+author+content_mlt+scoreqt=mlt
ERROR MLTSearchRequestProcessor:88 - Error executing query


Well, I didn't expect this to be such a hurdle. And I'm sure that hundreds
of people before me have already done something similar, haven't they? This
really looks bizarre.

Thank you all. (Otis, when I saw your name I got a feeling that it was just
a matter of seconds till this stubborn calls would start doing their job. :)
)

Sergey 



SergeyG wrote:
 
 Hi,
 
 Recently, while implementing the MoreLikeThis search, I've run into the
 situation when Solr's mlt query calls don't work. 
 
 More specifically, the following query:
 
 http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=
 5mlt.interestingTerms=detailsfl=title+author+score
 
 brings back just the doc with id=10 and nothing else. While using the
 GetMethod approach (putting /mlt explicitely into the url), I got back
 some results.
 
 I've been trying to solve this problem for more than a week with no luck.
 If anybody has any hint, please help.
 
 Below, I put logs  outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
 GetMethod (/select).
 
 Thanks a lot.
 
 Regards,
 Sergey Goldberg
 
 
 Here're the logs: 
 
 a) Solr (http://localhost:8080/solr/select)
 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt=
 truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2}
 hits=1 status=0 QTime=172
 
 INFO MLTSearchRequestProcessor:49 - SolrServer url:
 http://localhost:8080/solr
 INFO MLTSearchRequestProcessor:67 - solrQuery
 q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=
   5mlt.interestingTerms=detailsfl=title+author+score
 INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
 
 
 b) GetMethod (http://localhost:8080/solr/mlt)
 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/mlt
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max
 qt=5mlt.interestingTerms=details} status=0 QTime=15
 
 INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0
 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime0/int/lstresult name=match numFound=1 start=0
 maxScore=2.098612docfloat name=score2.098612/floatarr name=
 authorstrS.G./str/arrstr
 name=titleSG_Book/str/doc/resultresult name=response n
 umFound=4 start=0 maxScore=0.28923997docfloat
 name=score0.28923997/floatarr name=authorstrO.
 Henry/strstrS.G./str/arrstr name=titleFour Million,
 The/str/docdocfloat name=score0.08667877/floatarr
 name=authorstrKatherine Mosby/str/arrstr name=titleThe Season
 of Lillian Dawes/str/docdocfloat
 name=score0.07947738/floatarr name=authorstrJerome K.
 Jerome/str/arrstr name=titleThree Men in a
 Boat/str/docdocfloat 
 name=score0.047219563/floatarr name=authorstrCharles
 Oliver/strstrS.G./str/arrstr name=titleABC's of
 Science/str/doc/resultlst name=interestingTermsfloat
 name=content_mlt:ye1.0/floatfloat
 name=content_mlt:tobin1.0/floatfloat
 name=content_mlt:a1.0/floatfloat
 name=content_mlt:i1.0/floatfloat
 name=content_mlt:his1.0/float/lst
 /response
 
 
 c) GetMethod (http://localhost:8080/solr/select)
 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.
 maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16
 
 INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0
 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime16/intlst name=paramsstr 

Best way to integrate custom functionality

2009-07-08 Thread Andrew Nguyen

Hello all,

I am working on a project that involves searching through free-text  
fields and would like to add the ability to filter out negative  
expressions at a very simple level.  For example, the field may  
contain the text, person has no cars.  If the user were to search  
for cars, I would like to be able to intercept the results and  
return only those without the word no in front of the search term.   
While is is a very simple example, it's pretty much my end goal.


I've been reading up on the various hooks provided within Solr but  
wanted to get some guidance on the best way to proceed.


Thanks!

--Andrew


Boosting for most recent documents

2009-07-08 Thread vivek sar
Hi,

  I'm trying to find a way to get the most recent entry for the
searched word. For ex., if I have a document with field name user.
If I search for user:vivek, I want to get the document that was
indexed most recently. Two ways I could think of,

1) Sort by some time stamp field - but with millions of documents this
becomes a huge memory problem as we have seen OOM with sorting before
2) Boost the most recent document - I'm not sure how to do this.
Basically, we want to have the most recent document score higher than
any other and then we can retrieve just 10 records and sort in the
application by time stamp field to get the most recent document
matching the keyword.

Any suggestion on how can this be done?

Thanks,
-vivek


Re: Boosting for most recent documents

2009-07-08 Thread Otis Gospodnetic

Sort by the internal Lucene document ID and pick the highest one.  That might 
do the job for you.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 8:34:16 PM
 Subject: Boosting for most recent documents
 
 Hi,
 
   I'm trying to find a way to get the most recent entry for the
 searched word. For ex., if I have a document with field name user.
 If I search for user:vivek, I want to get the document that was
 indexed most recently. Two ways I could think of,
 
 1) Sort by some time stamp field - but with millions of documents this
 becomes a huge memory problem as we have seen OOM with sorting before
 2) Boost the most recent document - I'm not sure how to do this.
 Basically, we want to have the most recent document score higher than
 any other and then we can retrieve just 10 records and sort in the
 application by time stamp field to get the most recent document
 matching the keyword.
 
 Any suggestion on how can this be done?
 
 Thanks,
 -vivek



Re: Best way to integrate custom functionality

2009-07-08 Thread Otis Gospodnetic

How about, for example

+cars -no cars -nothing cars

 
In other words, the basic query is the original query, and then loop over all 
negative words and append exclude phrase clauses like in the above example.
That will find documents that have the word cars in them, but any documents 
with no cars phrase or nothing cars phrase will be excluded.

Just make sure your negative words are not stopwords.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Andrew Nguyen andrew-lists-solr-u...@na-consulting.net
 To: solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 7:17:09 PM
 Subject: Best way to integrate custom functionality
 
 Hello all,
 
 I am working on a project that involves searching through free-text fields 
 and 
 would like to add the ability to filter out negative expressions at a very 
 simple level.  For example, the field may contain the text, person has no 
 cars.  If the user were to search for cars, I would like to be able to 
 intercept the results and return only those without the word no in front of 
 the search term.  While is is a very simple example, it's pretty much my end 
 goal.
 
 I've been reading up on the various hooks provided within Solr but wanted to 
 get 
 some guidance on the best way to proceed.
 
 Thanks!
 
 --Andrew



Re: reindexed data on master not replicated to slave

2009-07-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jul 8, 2009 at 10:14 PM, solr jaysolr...@gmail.com wrote:
 Thanks. The patch looks good, and I now see the new index directory and it
 is in sync with the one on master. I'll do more testing.

 It is probably not important, but I am just curious why we switch index
 directory. I thought it would be easier to just rename index to index.*, and
 rename the new index directory to index.

It is for consistency across OS's . Windows would not let me do a rename.

 2009/7/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 jay,
 Thanks. The testcase was not enough. I have given a new patch . I
 guess that should solve this

 On Wed, Jul 8, 2009 at 3:48 AM, solr jaysolr...@gmail.com wrote:
  I guess in this case it doesn't matter whether the two directories
  tmpIndexDir and indexDir are the same or not. It looks that the index
  directory is switched to tmpIndexDir and then it is deleted inside
  finally.
 
  On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote:
 
  In fact, I saw the directory was created and then deleted.
 
 
  On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote:
 
  Ok, Here is the problem. In the function, the two directories
  tmpIndexDir
  and indexDir are the same (in this case only?), and then at the end of
  the
  function, the directory tmpIndexDir is deleted, which deletes the new
  index
  directory.
 
 
        } finally {
          delTree(tmpIndexDir);
 
        }
 
 
  On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote:
 
  I see. So I tried it again. Now index.properties has
 
  #index properties
  #Tue Jul 07 12:13:49 PDT 2009
  index=index.20090707121349
 
  but there is no such directory index.20090707121349 under the data
  directory.
 
  Thanks,
 
  J
 
 
  On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:
 
   It seemed that the patch fixed the symptom, but not the problem
  itself.
  
   Now the log messages looks good. After one download and installed
   the
   index,
   it printed out
  
   *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Slave in sync with master.*
  
   but the files inside index directory did not change. Both
  index.properties
   and replication.properties were updated though.
  
 
  Note that in this case, Solr would have created a new index
  directory.
  Are
  you comparing the files on the slave in the new index directory? You
  can
  get
  the new index directory's name from index.properties.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
 
 
 
  --
  J
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 J




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com