Tagging and searching on tagged indexes.

2009-07-07 Thread Rakhi Khatwani
Hi,
 How do we tag solr indexes and search on those indexes, there is not
much information on wiki. all i could find is this:
http://wiki.apache.org/solr/UserTagDesign

has anyone tried it? (using solr API)


One more question, can we change the schema dynamically at runtime? (while
solr instance is on??)

Regards,
Raakhi.


Re: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Marcus Herou
Out of my head... but are you not supposed to active the stream-handler in
SOLR ? Think it is documented...

Cheers
//Marcus


On Mon, Jul 6, 2009 at 8:55 PM, Francis Yakin fya...@liquid.com wrote:

 Yes, I uploaded the CSV file that I get it from Database then I ran that
 cmd and I have the error.

 Any suggestions?

 Thanks

 Francis

 -Original Message-
 From: NitinMalik [mailto:malik.ni...@yahoo.com]
 Sent: Monday, July 06, 2009 11:32 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is there any other way to load the index beside using http
 connection?


 Hi Francis,

 I have experienced that update stream handler (for a xml file in my case)
 worked only for Solr running on the same machine. I also got same error
 when
 I tried to update the documents on a remote Solr instance.

 Regards
 Nitin


 Francis Yakin wrote:
 
 
  Ok, I have a CSV file(called it test.csv) from database.
 
  When I tried to upload this file to solr using this cmd, I got
  stream.contentType=text/plain: No such file or directory error
 
  curl
 
 http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8
 
  -bash: stream.contentType=text/plain: No such file or directory
   undefined field cat
 
  What did I do wrong?
 
  Francis
 
  -Original Message-
  From: Norberto Meijome [mailto:numard...@gmail.com]
  Sent: Monday, July 06, 2009 11:01 AM
  To: Francis Yakin
  Cc: solr-user@lucene.apache.org
  Subject: Re: Is there any other way to load the index beside using http
  connection?
 
  On Mon, 6 Jul 2009 09:56:03 -0700
  Francis Yakin fya...@liquid.com wrote:
 
   Norberto,
 
  Thanks, I think my questions is:
 
  why not generate your SQL output directly into your oracle server as a
  file
 
  What type of file is this?
 
 
 
  a file in a format that you can then import into SOLR.
 
  _
  {Beto|Norberto|Numard} Meijome
 
  Gravity cannot be blamed for people falling in love.
Albert Einstein
 
  I speak for myself, not my employer. Contents may be hot. Slippery when
  wet. Reading disclaimers makes you go blind. Writing them is worse. You
  have been Warned.
 
 

 --
 View this message in context:
 http://www.nabble.com/Is-there-any-other-way-to-load-the-index-beside-using-%22http%22-connection--tp24297934p24360603.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/


Re: Tagging and searching on tagged indexes.

2009-07-07 Thread Shalin Shekhar Mangar
On Tue, Jul 7, 2009 at 11:37 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
 How do we tag solr indexes and search on those indexes, there is not
 much information on wiki. all i could find is this:
 http://wiki.apache.org/solr/UserTagDesign

 has anyone tried it? (using solr API)


That page was created for brainstorming a possible enhancement. It is not
implemented yet.


 One more question, can we change the schema dynamically at runtime? (while
 solr instance is on??)


You'd need to reload the core (or restart the server) and re-index all
documents for schema changes to take affect.

-- 
Regards,
Shalin Shekhar Mangar.


Can´t use wildcard * on alphanumeric values?

2009-07-07 Thread gateway0

Hi,

I indexed my data and defined a defaultsearchfield named text: (field
name=text type=text indexed=true stored=false multiValued=true/).

I copied all my other field values into that field. Now my problem:

Lets say I have 2 values indexed 
1.value ABCD
2.value ABCD3456

Now when I do a wildcard search over that two values the following happens:
- query:q=AB* = All two values are returned ABCD and ABCD3456 =
wildcard is functioning!
- query:q=ABCD3* = No results are returned! (expected: ABCD3456) =
wildcard does not function!

Am I doing something wrong? Is there a way to use wildcards on alphanumeric
values?

(offtopic: How is for example google dealing with a problem like that, are
they hiding the wildcards from the user)

kind regards Sebastian
-- 
View this message in context: 
http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24369209.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filtering MoreLikeThis results

2009-07-07 Thread Marc Sturlese

Using MoreLikeThisHandler you can use fq to filter your results. As far as I
know bq are not allowed.


Bill Au wrote:
 
 I have been trying to restrict MoreLikeThis results without any luck also.
 In additional to restricting the results, I am also looking to influence
 the
 scores similar to the way boost query (bq) works in the
 DisMaxRequestHandler.
 
 I think Solr's MoreLikeThis depends on Lucene's contrib queries
 MoreLikeThis, or at least it used to.  Has anyone looked into enhancing
 Solrs' MoreLikeThis to support bq and restricting mlt results?
 
 Bill
 
 On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
 

 I could not find any support from
 http://wiki.apache.org/solr/MoreLikeThison
 how to restrict MLT results to certain subsets. I passed along a fq
 parameter and it is ignored. Since we can not incorporate the filters in
 the
 query itself which is used to retrieve the target for similarity
 comparison,
 it appears there is no way to filter MLT results. BTW. I am using Solr
 1.3.
 Please let me know if there is way (other than hacking the source code)
 to
 do this. Thanks!
 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can´t use wildcard * on alphanumeric values?

2009-07-07 Thread Shalin Shekhar Mangar
On Tue, Jul 7, 2009 at 2:10 PM, gateway0 reiterwo...@yahoo.de wrote:


 I indexed my data and defined a defaultsearchfield named text: (field
 name=text type=text indexed=true stored=false
 multiValued=true/).

 Lets say I have 2 values indexed
 1.value ABCD
 2.value ABCD3456

 Now when I do a wildcard search over that two values the following happens:
 - query:q=AB* = All two values are returned ABCD and ABCD3456 =
 wildcard is functioning!
 - query:q=ABCD3* = No results are returned! (expected: ABCD3456) =
 wildcard does not function!

 Am I doing something wrong? Is there a way to use wildcards on alphanumeric
 values?


I think the problem is that the WordDelimiterFilter applied on 'text' type,
splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not
analyzed so that don't pass through the same filters.

I guess one simple solution to your problem is to add preserveOriginal=1
to the WordDelimiterFilterFactory definition inside the 'text' field type.

-- 
Regards,
Shalin Shekhar Mangar.


spell checker's collate values

2009-07-07 Thread Licinio Fernández Maurelo
Hi all,
i'm still trying to tune my spellchecker to get the results i expect
I've created a dictionary and currently i want to get an special behaviour
from the spellchecker.
The fact is that  when i introduce  the query 'Fernandox Alonso' i get what
i expect :

bool name=correctlySpelledfalse/bool
str name=collationFernando Alonso/str

but when i try 'Fernanda Alonso' its returns

lst name=spellcheck
-
lst name=suggestions
bool name=correctlySpelledtrue/bool
/lst
/lst

ok, Fernanda is a correct name, but i whant to boost some kind of values
(Fernado Alonso, Michael Jackson)
to be returned as suggestions. (as google do)

Any help?

regards
-- 
Lici


Re: reindexed data on master not replicated to slave

2009-07-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
Jay ,
I am opening an issue SOLR-1264
https://issues.apache.org/jira/browse/SOLR-1264

I have attached a patch as well . I guess that is the fix. could you
please confirm that.


On Tue, Jul 7, 2009 at 12:59 AM, solr jaysolr...@gmail.com wrote:
 It looks that the problem is here or before that in
 SnapPuller.fetchLatestIndex():


   terminateAndWaitFsyncService();
   LOG.info(Conf files are not downloaded or are in sync);
   if (isSnapNeeded) {
     modifyIndexProps(tmpIndexDir.getName());
   } else {
     successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);
   }
   if (successfulInstall) {
     logReplicationTimeAndConfFiles(modifiedConfFiles);
     doCommit();
   }


 Debugged into the place, and noticed that isSnapNeeded is true and therefore

 modifyIndexProps(tmpIndexDir.getName());

 executed, but from the function name it looks that installing index actually
 happens in

 successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);


 The function returns false, but the caller (doSnapPull) never checked the
 return value.


 Thanks,

 J


 On Mon, Jul 6, 2009 at 8:02 AM, solr jay solr...@gmail.com wrote:

 There is only one index directory: index/

 Here is the content of index.properties

 #index properties
 #Fri Jul 03 14:17:12 PDT 2009
 index=index.20090703021705


 Thanks,

 J

 2009/7/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 BTW , how many index dirs are there in the data dir ? what is there in
 the datadir/index.properties ?

 On Sat, Jul 4, 2009 at 12:15 AM, solr jaysolr...@gmail.com wrote:
 
 
  I tried it with the latest nightly build and got the same result.
 
  Actually that was the symptom and it made me looking at the index
  directory.
  The same log messages repeated again and again, never end.
 
 
 
  2009/7/2 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  jay , I see updating index properties... twice
 
 
 
  this should happen rarely. in your case it should have happened only
  once. because you cleaned up the master only once
 
 
  On Fri, Jul 3, 2009 at 6:09 AM, Otis
  Gospodneticotis_gospodne...@yahoo.com wrote:
  
   Jay,
  
   You didn't mention which version of Solr you are using.  It looks
   like
   some trunk or nightly version.  Maybe you can try the latest
   nightly?
  
    Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: solr jay solr...@gmail.com
   To: solr-user@lucene.apache.org
   Sent: Thursday, July 2, 2009 9:14:48 PM
   Subject: reindexed data on master not replicated to slave
  
   Hi,
  
   When index data were corrupted on master instance, I wanted to wipe
   out
   all
   the index data and re-index everything. I was hoping the newly
   created
   index
   data would be replicated to slaves, but it wasn't.
  
   Here are the steps I performed:
  
   1. stop master
   2. delete the directory 'index'
   3. start master
   4. disable replication on master
   5. index all data from scratch
   6. enable replication on master
  
   It seemed from log file that the slave instances discovered that
   new
   index
   are available and claimed that new index installed, and then trying
   to
   update index properties, but looking into the index directory on
   slaves, you
   will find that no index data files were updated or added, plus
   slaves
   keep
   trying to get new index. Here are some from slave's log file:
  
   Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Starting replication process
   Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Number of files in latest snapshot in master: 69
   Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Total time taken for download : 0 secs
   Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Conf files are not downloaded or are in sync
   Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
   modifyIndexProps
   INFO: New index installed. Updating index properties...
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Master's version: 1246488421310, generation: 9
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Slave's version: 1246385166228, generation: 56
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Starting replication process
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Number of files in latest snapshot in master: 69
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Total time taken for download : 0 secs
   Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller
   fetchLatestIndex
   INFO: Conf files are not downloaded or are in sync
   Jul 1, 2009 4:00:33 PM 

Can't limit return fields in custom request handler

2009-07-07 Thread Osman İZBAT
Hi.

I'm writing my custom faceted request handler.

But I have a problem like this;  when i call
http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id,
itemTitle
i'm getiing all fields instead of only id and itemTitle.

Also i'm gettting no result when i give none null filter parameter in
getDocListAndSet(...).

public class MyCustomFacetRequestHandler extends StandardRequestHandler {

public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
rsp) throws Exception {
try {

SolrParams solrParams = req.getParams();
Query q = QueryParsing.parseQuery(solrParams.get(q),
req.getSchema());
DocListAndSet results = req.getSearcher().getDocListAndSet(q,
(Query)null, (Sort)null, solrParams.getInt(start),
solrParams.getInt(limit));
...

Regards.

-- 
Osman İZBAT


Re: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Yonik Seeley
Look at the error - it's bash (your command line shell) complaining.
The '' terminates one command and puts it in the background.
Surrounding the command with quotes will get you one step closer:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

-Yonik
http://www.lucidimagination.com



On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote:

 Ok, I have a CSV file(called it test.csv) from database.

 When I tried to upload this file to solr using this cmd, I got 
 stream.contentType=text/plain: No such file or directory error

 curl 
 http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8

 -bash: stream.contentType=text/plain: No such file or directory
  undefined field cat

 What did I do wrong?

 Francis

 -Original Message-
 From: Norberto Meijome [mailto:numard...@gmail.com]
 Sent: Monday, July 06, 2009 11:01 AM
 To: Francis Yakin
 Cc: solr-user@lucene.apache.org
 Subject: Re: Is there any other way to load the index beside using http 
 connection?

 On Mon, 6 Jul 2009 09:56:03 -0700
 Francis Yakin fya...@liquid.com wrote:

  Norberto,

 Thanks, I think my questions is:

 why not generate your SQL output directly into your oracle server as a file

 What type of file is this?



 a file in a format that you can then import into SOLR.

 _
 {Beto|Norberto|Numard} Meijome

 Gravity cannot be blamed for people falling in love.
  Albert Einstein

 I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.



Re: Loading Data into Solr without HTTP

2009-07-07 Thread Yonik Seeley
On Tue, Jul 7, 2009 at 8:41 AM, Anand Kumar
Prabhakaranand2...@gmail.com wrote:
 Is there any way so that we can read the data from the
 CSV file and load it into the Solr database without using /update/csv

That *is* the right way to load a CSV file into Solr.
How many records are in the CSV file, and how much heap are you giving the JVM?
Try a small CSV file first to make sure that it's being parsed
correctly... for example, do a

head -1000 bigfile.csv  smallfile.csv

Now upload that and inspect the documents by querying Solr to ensure
that everything imported as expected.

-Yonik
http://www.lucidimagination.com


Re: Loading Data into Solr without HTTP

2009-07-07 Thread Anand Kumar Prabhakar

Thank you for the Reply Yonik, I have already tried with smaller CSV files,
currently we are trying to load a CSV file of 400 MB but this is taking too
much time(more than half an hour). I want to know is there any method to do
it much faster, we have overcome the OutOfMemoryException by increasing heap
space.

Please suggest.



Yonik Seeley-2 wrote:
 
 On Tue, Jul 7, 2009 at 8:41 AM, Anand Kumar
 Prabhakaranand2...@gmail.com wrote:
 Is there any way so that we can read the data from the
 CSV file and load it into the Solr database without using /update/csv
 
 That *is* the right way to load a CSV file into Solr.
 How many records are in the CSV file, and how much heap are you giving the
 JVM?
 Try a small CSV file first to make sure that it's being parsed
 correctly... for example, do a
 
 head -1000 bigfile.csv  smallfile.csv
 
 Now upload that and inspect the documents by querying Solr to ensure
 that everything imported as expected.
 
 -Yonik
 http://www.lucidimagination.com
 
 

-- 
View this message in context: 
http://www.nabble.com/Loading-Data-into-Solr-without-HTTP-tp24372564p24373116.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can´t use wildcard * on alphanumeric values?

2009-07-07 Thread gateway0

Thank you, that was it.

Why is the preserveOriginal=1 option nowhere documented?




Shalin Shekhar Mangar wrote:
 
 On Tue, Jul 7, 2009 at 2:10 PM, gateway0 reiterwo...@yahoo.de wrote:
 

 I indexed my data and defined a defaultsearchfield named text: (field
 name=text type=text indexed=true stored=false
 multiValued=true/).

 Lets say I have 2 values indexed
 1.value ABCD
 2.value ABCD3456

 Now when I do a wildcard search over that two values the following
 happens:
 - query:q=AB* = All two values are returned ABCD and ABCD3456 =
 wildcard is functioning!
 - query:q=ABCD3* = No results are returned! (expected: ABCD3456) =
 wildcard does not function!

 Am I doing something wrong? Is there a way to use wildcards on
 alphanumeric
 values?

 
 I think the problem is that the WordDelimiterFilter applied on 'text'
 type,
 splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not
 analyzed so that don't pass through the same filters.
 
 I guess one simple solution to your problem is to add preserveOriginal=1
 to the WordDelimiterFilterFactory definition inside the 'text' field type.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24373135.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Loading Data into Solr without HTTP

2009-07-07 Thread Yonik Seeley
On Tue, Jul 7, 2009 at 9:14 AM, Anand Kumar
Prabhakaranand2...@gmail.com wrote:
 I want to know is there any method to do
 it much faster, we have overcome the OutOfMemoryException by increasing heap
 space.

Optimize your schema - eliminate all unnecessary copyFields and
default values.  The current example schema is not good for
performance benchmarking.

-Yonik
http://www.lucidimagination.com


Re: Loading Data into Solr without HTTP

2009-07-07 Thread Yonik Seeley
Also make sure you don't have any autocommit rules enabled in solrconfig.xml

How many documents are in the 400MB CSV file, and how long does it
take to index now?

-Yonik
http://www.lucidimagination.com



On Tue, Jul 7, 2009 at 10:03 AM, Anand Kumar
Prabhakaranand2...@gmail.com wrote:

 Hi Yonik,

 Currently our Schema has very few fields and we don't have any copy fields
 also. Please find the below Schema.xml we are using:

 ?xml version=1.0 encoding=UTF-8 ?
 schema name=cmps version=1.1
  !-- attribute name is the name of this schema and is only used for
 display purposes.
       Applications should change this to reflect the nature of the search
 collection.
       version=1.1 is Solr's version number for the schema syntax and
 semantics.  It should
       not normally be changed by applications.
       1.0: multiValued attribute did not exist, all fields are multiValued
 by nature
       1.1: multiValued attribute introduced, false by default --
  types


    fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

    fieldType name=boolean class=solr.BoolField sortMissingLast=true
 omitNorms=true/


    fieldType name=integer class=solr.IntField omitNorms=true/
    fieldType name=long class=solr.LongField omitNorms=true/
    fieldType name=float class=solr.FloatField omitNorms=true/
    fieldType name=double class=solr.DoubleField omitNorms=true/

    fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true omitNorms=true/
    fieldType name=slong class=solr.SortableLongField
 sortMissingLast=true omitNorms=true/
    fieldType name=sfloat class=solr.SortableFloatField
 sortMissingLast=true omitNorms=true/
    fieldType name=sdouble class=solr.SortableDoubleField
 sortMissingLast=true omitNorms=true/

    fieldType name=date class=solr.DateField sortMissingLast=true
 omitNorms=true/

    fieldType name=random class=solr.RandomSortField indexed=true /



    fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
      /analyzer
    /fieldType
    fieldType name=text class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/


        filter class=solr.StopFilterFactory
                ignoreCase=true
                words=stopwords.txt
                enablePositionIncrements=true
                /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

    fieldType name=textTight class=solr.TextField
 positionIncrementGap=100 
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

    fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 
      analyzer
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

    fieldType name=alphaNumericKeyword class=solr.TextField
 sortMissingLast=true omitNorms=true
      analyzer

        tokenizer class=solr.KeywordTokenizerFactory/

      /analyzer
    /fieldType


    fieldtype name=ignored stored=false indexed=false
 class=solr.StrField /
    fieldType name=phNo class=solr.TextField
 positionIncrementGap=100 sortMissingLast=true omitNorms=true
        analyzer
                tokenizer class=solr.KeywordTokenizerFactory/

        /analyzer
    /fieldType
    fieldType name=textStA 

Re: Loading Data into Solr without HTTP

2009-07-07 Thread Anand Kumar Prabhakar

Hi Yonik,

Currently our Schema has very few fields and we don't have any copy fields
also. Please find the below Schema.xml we are using:

?xml version=1.0 encoding=UTF-8 ?
schema name=cmps version=1.1
  !-- attribute name is the name of this schema and is only used for
display purposes.
   Applications should change this to reflect the nature of the search
collection.
   version=1.1 is Solr's version number for the schema syntax and
semantics.  It should
   not normally be changed by applications.
   1.0: multiValued attribute did not exist, all fields are multiValued
by nature
   1.1: multiValued attribute introduced, false by default --
  types

  
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

fieldType name=boolean class=solr.BoolField sortMissingLast=true
omitNorms=true/
  

fieldType name=integer class=solr.IntField omitNorms=true/
fieldType name=long class=solr.LongField omitNorms=true/
fieldType name=float class=solr.FloatField omitNorms=true/
fieldType name=double class=solr.DoubleField omitNorms=true/
   
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField
sortMissingLast=true omitNorms=true/
fieldType name=sfloat class=solr.SortableFloatField
sortMissingLast=true omitNorms=true/
fieldType name=sdouble class=solr.SortableDoubleField
sortMissingLast=true omitNorms=true/

fieldType name=date class=solr.DateField sortMissingLast=true
omitNorms=true/
  
fieldType name=random class=solr.RandomSortField indexed=true /
   
  
   
fieldType name=text_ws class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
   
  
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
  
fieldType name=textTight class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
   
fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
   
fieldType name=alphaNumericKeyword class=solr.TextField
sortMissingLast=true omitNorms=true
  analyzer
  
tokenizer class=solr.KeywordTokenizerFactory/
   
  /analyzer
/fieldType


fieldtype name=ignored stored=false indexed=false
class=solr.StrField / 
fieldType name=phNo class=solr.TextField 
positionIncrementGap=100 sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/

/analyzer
/fieldType
fieldType name=textStA class=solr.TextField
positionIncrementGap=100 sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
  

Re: Indexing XML

2009-07-07 Thread Matt Mitchell
Saeli,

Solr expects a certain XML structure when adding documents. You'll need to
come up with a mapping, that translates the original structure to one that
solr understands. You can then search solr and get those solr documents
back. If you want to keep the original XML, you can store it in a field
within the solr document.

original data - mapping - solr XML document (with a field for the original
data)

Does that make sense? Can you describe what it is you want to do with
results of a search?

Matt

On Tue, Jul 7, 2009 at 10:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote:

 Hello.

 I'm a new user of Solr, I already used Lucene to index files and search.
 But my programme was too slow, it's why I was looking for another solution,
 and I thought I found it.

 I said I thought because I don't know if it's possible to use solar with
 this kind of XML files.

  lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0
 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd;
 general
 identifier
 catalogSTRING HERE/catalog
 entry
 STRING HERE
 /entry
 /identifier
 title
 string language=fr
 STRING HERE
 /string
 /title
 languagefr/language
 description
 string language=fr
 STRING HERE
 /string
 /description
 /general
 lifeCycle
 status
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /status
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 /contribute
 /lifeCycle
 metaMetadata
 identifier
 catalogSTRING HERE/catalog
 entrySTRING HERE/entry
 /identifier
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 entitySTRING HERE/entity
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 metadataSchemaSTRING HERE/metadataSchema
 languageSTRING HERE/language
 /metaMetadata
 technical
 locationSTRING HERE
 /location
 /technical
 educational
 intendedEndUserRole
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /intendedEndUserRole
 context
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /context
 typicalAgeRange
 string language=frSTRING HERE/string
 /typicalAgeRange
 description
 string language=frSTRING HERE/string
 /description
 description
 string language=frSTRING HERE/string
 /description
 languageSTRING HERE/language
 /educational
 annotation
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /annotation
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE /string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 /lom

 I don't know how I can use this kind of file with Solr because the XML
 example are this one.

  add
  doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=manuApache Software Foundation/field
  field name=catsoftware/field
  field name=catsearch/field
  field name=featuresAdvanced Full-Text Search Capabilities using
 Lucene/field
  field name=featuresOptimized for High Volume Web Traffic/field
  field name=featuresStandards Based Open Interfaces - XML and
 HTTP/field
  field name=featuresComprehensive HTML Administration
 Interfaces/field
  field name=featuresScalability - Efficient Replication to other Solr
 Search Servers/field
  field name=featuresFlexible and Adaptable with XML configuration and
 Schema/field
  field name=featuresGood unicode support: h#xE9;llo (hello with an
 accent over the e)/field
  field name=price0/field
 field name=popularity10/field
 field name=inStocktrue/field
 field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
 /doc
 /add

 I understood Solr need this kind of architecture, by Architecture I mean
 field + name=keywordValue/field
 or as you can see I can't use this kind of architecture because I'm not
 allow to change my XML files.

 I'm looking forward to read you.

 Mathieu Saeli
 --
 Saeli Mathieu.



Re: Filtering MoreLikeThis results

2009-07-07 Thread Bill Au
I think fq only works on the main response, not the mlt matches.  I found a
couple of releated jira:

http://issues.apache.org/jira/browse/SOLR-295
http://issues.apache.org/jira/browse/SOLR-281

If I am reading them correctly, I should be able to use DIsMax and
MoreLikeThis together.  I will give that a try and report back.

Bill


On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Using MoreLikeThisHandler you can use fq to filter your results. As far as
 I
 know bq are not allowed.


 Bill Au wrote:
 
  I have been trying to restrict MoreLikeThis results without any luck
 also.
  In additional to restricting the results, I am also looking to influence
  the
  scores similar to the way boost query (bq) works in the
  DisMaxRequestHandler.
 
  I think Solr's MoreLikeThis depends on Lucene's contrib queries
  MoreLikeThis, or at least it used to.  Has anyone looked into enhancing
  Solrs' MoreLikeThis to support bq and restricting mlt results?
 
  Bill
 
  On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
 
 
  I could not find any support from
  http://wiki.apache.org/solr/MoreLikeThison
  how to restrict MLT results to certain subsets. I passed along a fq
  parameter and it is ignored. Since we can not incorporate the filters in
  the
  query itself which is used to retrieve the target for similarity
  comparison,
  it appears there is no way to filter MLT results. BTW. I am using Solr
  1.3.
  Please let me know if there is way (other than hacking the source code)
  to
  do this. Thanks!
  --
  View this message in context:
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Indexing XML

2009-07-07 Thread Saeli Mathieu
Hello.

I'm a new user of Solr, I already used Lucene to index files and search.
But my programme was too slow, it's why I was looking for another solution,
and I thought I found it.

I said I thought because I don't know if it's possible to use solar with
this kind of XML files.

 lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0
http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd;
general
identifier
catalogSTRING HERE/catalog
entry
STRING HERE
/entry
/identifier
title
string language=fr
STRING HERE
/string
/title
languagefr/language
description
string language=fr
STRING HERE
/string
/description
/general
lifeCycle
status
sourceSTRING HERE/source
valueSTRING HERE/value
/status
contribute
role
sourceSTRING HERE/source
valueSTRING HERE/value
/role
entitySTRING HERE
/entity
/contribute
/lifeCycle
metaMetadata
identifier
catalogSTRING HERE/catalog
entrySTRING HERE/entry
/identifier
contribute
role
sourceSTRING HERE/source
valueSTRING HERE/value
/role
entitySTRING HERE
/entity
date
dateTimeSTRING HERE/dateTime
/date
/contribute
contribute
role
sourceSTRING HERE/source
valueSTRING HERE/value
/role
entitySTRING HERE
/entity
entitySTRING HERE/entity
entitySTRING HERE
/entity
date
dateTimeSTRING HERE/dateTime
/date
/contribute
metadataSchemaSTRING HERE/metadataSchema
languageSTRING HERE/language
/metaMetadata
technical
locationSTRING HERE
/location
/technical
educational
intendedEndUserRole
sourceSTRING HERE/source
valueSTRING HERE/value
/intendedEndUserRole
context
sourceSTRING HERE/source
valueSTRING HERE/value
/context
typicalAgeRange
string language=frSTRING HERE/string
/typicalAgeRange
description
string language=frSTRING HERE/string
/description
description
string language=frSTRING HERE/string
/description
languageSTRING HERE/language
/educational
annotation
entitySTRING HERE
/entity
date
dateTimeSTRING HERE/dateTime
/date
/annotation
classification
purpose
sourceSTRING HERE/source
valueSTRING HERE/value
/purpose
/classification
classification
purpose
sourceSTRING HERE/source
valueSTRING HERE/value
/purpose
taxonPath
source
string language=frSTRING HERE/string
/source
taxon
idSTRING HERE/id
entry
string language=frSTRING HERE/string
/entry
/taxon
/taxonPath
/classification
classification
purpose
sourceSTRING HERE/source
valueSTRING HERE/value
/purpose
taxonPath
source
string language=frSTRING HERE /string
/source
taxon
idSTRING HERE/id
entry
string language=frSTRING HERE/string
/entry
/taxon
/taxonPath
taxonPath
source
string language=frSTRING HERE/string
/source
taxon
idSTRING HERE/id
entry
string language=frSTRING HERE/string
/entry
/taxon
/taxonPath
/classification
/lom

I don't know how I can use this kind of file with Solr because the XML
example are this one.

 add
 doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=manuApache Software Foundation/field
  field name=catsoftware/field
  field name=catsearch/field
  field name=featuresAdvanced Full-Text Search Capabilities using
Lucene/field
  field name=featuresOptimized for High Volume Web Traffic/field
  field name=featuresStandards Based Open Interfaces - XML and
HTTP/field
  field name=featuresComprehensive HTML Administration
Interfaces/field
  field name=featuresScalability - Efficient Replication to other Solr
Search Servers/field
  field name=featuresFlexible and Adaptable with XML configuration and
Schema/field
  field name=featuresGood unicode support: h#xE9;llo (hello with an
accent over the e)/field
 field name=price0/field
field name=popularity10/field
field name=inStocktrue/field
field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
/doc
/add

I understood Solr need this kind of architecture, by Architecture I mean
field + name=keywordValue/field
or as you can see I can't use this kind of architecture because I'm not
allow to change my XML files.

I'm looking forward to read you.

Mathieu Saeli
-- 
Saeli Mathieu.


Question regarding ExtractingRequestHandler

2009-07-07 Thread ahammad

Hello,

I've recently started using this handler to index MS Word and PDF files.
When I set ext.extract.only=true, I get back all the metadata that is
associated with that file.

If I want to index, I need to set ext.extract.only=false. If I want to index
all that metadata along with the contents, what inputs do I need to pass to
the http request? Do I have to specifically define all the fields in the
schema or can Solr dynamically generate those fields?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SynonymFilterFactory usage

2009-07-07 Thread Mani Kumar
anyone?

ps: my apologies if you guys think its spamming. but i really need some help
here.

thanks!
mani

On Sun, Jul 5, 2009 at 12:49 PM, Mani Kumar manikumarchau...@gmail.comwrote:

 hi all,

 i am confused a bit about how to use synonym filter configs. i am using
 solr 1.4.

 default config is like :

 for query analyzer:
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 for index analyzer:
 its commented.

 while looking @ documentation deeply on

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
 ***Keep in mind that while the SynonymFilter will happily work with
 synonyms containing multiple words (ie: **
 sea biscuit, sea biscit, seabiscuit**) The recommended approach for
 dealing with synonyms like this, is to expand the synonym when indexing. This
 is because there are two potential issues that can arrise at query time**
 *
 *
 *
 considering this above recommendation i think following is the best option
 for synonym filter
 *  for query analyzer: *
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
 for index analyzer:
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/

 am i right?

 what do you guys suggest?

 thanks!
 mani kumar




Browse indexed terms in a field

2009-07-07 Thread Pierre-Yves LANDRON

Hello,

Here is what I would like to achieve : in an indexed document there's a 
fulltext indexed field ; I'd like to browse the terms in this field, ie. get 
all the terms that match the begining of a given word, for example. 
I can get all the field's facets for this document, but that's a lot of terms 
to process ; is there a way to constraint the returned facets ? 

Thank you for your highlights.
Kind regards,
Pierre.

_
More than messages–check out the rest of the Windows Live™.
http://www.microsoft.com/windows/windowslive/

Re: Filtering MoreLikeThis results

2009-07-07 Thread Marc Sturlese

At least in trunk, if you request for:
http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100 TO 200]
It will filter the MoreLikeThis results


Bill Au wrote:
 
 I think fq only works on the main response, not the mlt matches.  I found
 a
 couple of releated jira:
 
 http://issues.apache.org/jira/browse/SOLR-295
 http://issues.apache.org/jira/browse/SOLR-281
 
 If I am reading them correctly, I should be able to use DIsMax and
 MoreLikeThis together.  I will give that a try and report back.
 
 Bill
 
 
 On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 Using MoreLikeThisHandler you can use fq to filter your results. As far
 as
 I
 know bq are not allowed.


 Bill Au wrote:
 
  I have been trying to restrict MoreLikeThis results without any luck
 also.
  In additional to restricting the results, I am also looking to
 influence
  the
  scores similar to the way boost query (bq) works in the
  DisMaxRequestHandler.
 
  I think Solr's MoreLikeThis depends on Lucene's contrib queries
  MoreLikeThis, or at least it used to.  Has anyone looked into enhancing
  Solrs' MoreLikeThis to support bq and restricting mlt results?
 
  Bill
 
  On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
 
 
  I could not find any support from
  http://wiki.apache.org/solr/MoreLikeThison
  how to restrict MLT results to certain subsets. I passed along a fq
  parameter and it is ignored. Since we can not incorporate the filters
 in
  the
  query itself which is used to retrieve the target for similarity
  comparison,
  it appears there is no way to filter MLT results. BTW. I am using Solr
  1.3.
  Please let me know if there is way (other than hacking the source
 code)
  to
  do this. Thanks!
  --
  View this message in context:
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Browse indexed terms in a field

2009-07-07 Thread Bill Au
You can use facet.perfix to match the beginning of a given word:

http://wiki.apache.org/solr/SimpleFacetParameters#head-579914ef3a14d775a5ac64d2c17a53f3364e3cf6

Bill

On Tue, Jul 7, 2009 at 11:02 AM, Pierre-Yves LANDRON
pland...@hotmail.comwrote:


 Hello,

 Here is what I would like to achieve : in an indexed document there's a
 fulltext indexed field ; I'd like to browse the terms in this field, ie. get
 all the terms that match the begining of a given word, for example.
 I can get all the field's facets for this document, but that's a lot of
 terms to process ; is there a way to constraint the returned facets ?

 Thank you for your highlights.
 Kind regards,
 Pierre.

 _
 More than messages–check out the rest of the Windows Live™.
 http://www.microsoft.com/windows/windowslive/


Re: Filtering MoreLikeThis results

2009-07-07 Thread Bill Au
I have been using the StandardRequestHandler (ie /solr/select).  fq does
work with the MoreLikeThisHandler.  I will switch to use that.  Thanks.

Bill

On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese marc.sturl...@gmail.comwrote:


 At least in trunk, if you request for:
 http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO
  200]
 It will filter the MoreLikeThis results


 Bill Au wrote:
 
  I think fq only works on the main response, not the mlt matches.  I found
  a
  couple of releated jira:
 
  http://issues.apache.org/jira/browse/SOLR-295
  http://issues.apache.org/jira/browse/SOLR-281
 
  If I am reading them correctly, I should be able to use DIsMax and
  MoreLikeThis together.  I will give that a try and report back.
 
  Bill
 
 
  On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
  marc.sturl...@gmail.comwrote:
 
 
  Using MoreLikeThisHandler you can use fq to filter your results. As far
  as
  I
  know bq are not allowed.
 
 
  Bill Au wrote:
  
   I have been trying to restrict MoreLikeThis results without any luck
  also.
   In additional to restricting the results, I am also looking to
  influence
   the
   scores similar to the way boost query (bq) works in the
   DisMaxRequestHandler.
  
   I think Solr's MoreLikeThis depends on Lucene's contrib queries
   MoreLikeThis, or at least it used to.  Has anyone looked into
 enhancing
   Solrs' MoreLikeThis to support bq and restricting mlt results?
  
   Bill
  
   On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
  
  
   I could not find any support from
   http://wiki.apache.org/solr/MoreLikeThison
   how to restrict MLT results to certain subsets. I passed along a fq
   parameter and it is ignored. Since we can not incorporate the filters
  in
   the
   query itself which is used to retrieve the target for similarity
   comparison,
   it appears there is no way to filter MLT results. BTW. I am using
 Solr
   1.3.
   Please let me know if there is way (other than hacking the source
  code)
   to
   do this. Thanks!
   --
   View this message in context:
  
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Solr set up

2009-07-07 Thread G T
Hi,

I was interested in creating a test environment where i can

make use of solr/ lucene .My objective is to be able to test

various features of solr .(replication , performance, indexing , searching
and so on)

I wanted someone to give me a start on above.I am well versed

with lucene/solr basics.

Gaurav


Re: solr health check

2009-07-07 Thread Koji Sekiguchi

solr jay wrote:

Hi,

I am looking at this piece of configuration in solrconfig.xml

admin
defaultQuerysolr/defaultQuery
gettableFiles
 solrconfig.xml
 schema.xml
/gettableFiles
pingQueryq=solramp;version=2.0amp;start=0amp;rows=0/pingQuery

!-- configure a healthcheck file for servers behind a loadbalancer
  --
healthcheck type=fileserver-enabled/healthcheck
  /admin

  

I've never used this feature before, but reading source code...


It wasn't clear to me what 'server-enabled' means here. Is it a file name?
  

Yes, it is file name.


If it is file name, where the file should be?

  
The file name should be absolute path or relative path from solr work 
directory
(if you start solr from example directory, make server-enabled file in 
example

directory).


I added healthcheck type=fileserver-enabled/healthcheckand admin/ping
stopped working, which is good, but I couldn't make it work again, and admin
UI generate an exception. Anyone used this feature before?

  

I don't understand why you are getting the follwoing error...
You should get HTTP ERROR: 503 Service disabled instead...

Koji


Thanks,

J


HTTP ERROR: 500

PWC6033: Unable to compile class for JSP

PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp
PWC6199: Generated servlet error:
Type mismatch: cannot convert from Logger to Logger

PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp
PWC6199: Generated servlet error:
The method log(Level, String) is undefined for the type Logger



org.apache.jasper.JasperException: PWC6033: Unable to compile class for JSP

PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp
PWC6199: Generated servlet error:
Type mismatch: cannot convert from Logger to Logger

PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp
PWC6199: Generated servlet error:
The method log(Level, String) is undefined for the type Logger


at
org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandler.java:94)
at
org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:267)
at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:332)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:389)
at
org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:579)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:295)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:503)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:827)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:511)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:210)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:379)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
RequestURI=/solr/admin/action.jsp

  




posting binary file and metadata in two separate documents

2009-07-07 Thread rossputin

Hi.

I am currently using Solr Cell to extract content from binary files, and I
am passing along some additional metadata with ext.literal params. Sample
below:

curl
http://localhost:8983/solr/update/extract?ext.literal.id=2ext.literal.some_code1=code1ext.literal.some_code2=code2ext.idx.attr=true\ext.def.fl=text;
-F myfi...@myfile.pdf

Where I have large numbers of ext.literal params this becomes a bit of a
chore.. and it would be the same case in an html form with many params... 
can I pass both files to '/update/extract' as documents, (files) linked
together?  Or are there any other options like this?  Perhaps something I
can do with Solrj.

Thanks in advance for your help,

regards,

Ross.


-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24375649.html
Sent from the Solr - User mailing list archive at Nabble.com.



KStem download

2009-07-07 Thread Pascal Dimassimo

Hi,

I want to try KStem. I'm following the instructions on this page:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

... but the download link doesn't work.

Is anyone know the new location to download KStem?
-- 
View this message in context: 
http://www.nabble.com/KStem-download-tp24375856p24375856.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query on the updation of synonym and stopword file.

2009-07-07 Thread Koji Sekiguchi

Sagar,

 I am facing a problem here that even after the core reload and 
re-indexing

 the documents the new updated synonym or stop words are not loaded.
 Seems so the filters are not aware that these files are updated so 
the solution

 to me is to restart the whole container in which I have embedded
 the Solr server; it is not feasible in production.

I am not a multicore user, but I can see the synonyms.txt updated
after reloading the core (I verified it via analysis.jsp, not re-indexing),
wothout restarting solr server. I'm using 1.4. What version are you using?

Koji


Sagar Khetkade wrote:

Hello All,
 
I was figuring out the issue with the synonym.txt and stopword.txt files being updated on regular interval. 
Here in my case  I am updating the synonym.txt and stopword.txt files as the synonym and stop word dictionary is update. I am facing a problem here that even after the core reload and re-indexing the documents the new updated synonym or stop words are not loaded. Seems so the filters are not aware that these files are updated so the solution to me is to restart the whole container in which I have embedded the Solr server; it is not feasible in production.

I  came across the discussion with subject “ synonyms.txt file updated 
frequently” in which Grant had a view to write a new logic in 
SynonymFilterFactory which would take care of this issue.  Is there any 
possible solution to this or is this the solution.
Thanks in advance!
 
Regards,

Sagar Khetkade
 
 
_

Missed any of the IPL matches ? Catch a recap of all the action on MSN Videos
http://msnvideos.in/iplt20/msnvideoplayer.aspx
  




Re: Multiple values for custom fields provided in SOLR query

2009-07-07 Thread Suryasnat Das
Hi Otis,

Thanks for replying to my query.

My query is, if multiple values are provided for a custom field then how can
it be represented in a SOLR query. So if my field is fileID and its values
are 111, 222 and 333 and my search string is ‘product’ then how can this be
represented in a SOLR query? I want to perform the search on basis of
fileIDs *and* search string provided.

If i provide the query in the format,
q=fileID:111+fileID:222+fileID:333+product, then how will it actually
search? Can you please provide me the correct format of the query?

Regards

Suryasnat Das

On Mon, Jul 6, 2009 at 10:05 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 I actually don't fully understand your question.
 q=+fileID:111+fileID:222+fileID:333+apple looks like a valid query to me.
 (not sure what that space encoded as + is, though)

 Also not sure what you mean by:
  Basically the requirement is , if fileIDs are provided as search
 parameter
  then search should happen on the basis of fileID.


 Do you mean apple should be ignored if a term (field name:field value) is
 provided?

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Suryasnat Das suryaatw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, July 6, 2009 11:31:10 AM
  Subject: Multiple values for custom fields provided in SOLR query
 
  Hi,
  I have a requirement in which i need to have multiple values in my custom
  fields while forming the search query to SOLR. For example,
  fileID is my custom field. I have defined the fileID in schema.xml as
  name=fileID type=string indexed=true stored=true required=true
  multiValued=true/.
  Now fileID can have multiple values like 111,222,333 etc. So will my
 query
  be of the form,
 
  q=+fileID:111+fileID:222+fileID:333+apple
 
  where apple is my search query string. I tried with the above query but
 it
  did not work. SOLR gave invalid query error.
  Basically the requirement is , if fileIDs are provided as search
 parameter
  then search should happen on the basis of fileID.
 
  Is my approach correct or i need to do something else? Please, if
 immediate
  help is provided then that would be great.
 
  Regards
  Suryasnat Das
  Infosys.




Re: Filtering MoreLikeThis results

2009-07-07 Thread Yao Ge

I am not sure about the parameters for MLT the requestHandler plugin. Can one
of you share the solrconfig.xml entry for MLT? Thanks in advance.
-Yao


Bill Au wrote:
 
 I have been using the StandardRequestHandler (ie /solr/select).  fq does
 work with the MoreLikeThisHandler.  I will switch to use that.  Thanks.
 
 Bill
 
 On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 At least in trunk, if you request for:
 http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO
 200]
 It will filter the MoreLikeThis results


 Bill Au wrote:
 
  I think fq only works on the main response, not the mlt matches.  I
 found
  a
  couple of releated jira:
 
  http://issues.apache.org/jira/browse/SOLR-295
  http://issues.apache.org/jira/browse/SOLR-281
 
  If I am reading them correctly, I should be able to use DIsMax and
  MoreLikeThis together.  I will give that a try and report back.
 
  Bill
 
 
  On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
  marc.sturl...@gmail.comwrote:
 
 
  Using MoreLikeThisHandler you can use fq to filter your results. As
 far
  as
  I
  know bq are not allowed.
 
 
  Bill Au wrote:
  
   I have been trying to restrict MoreLikeThis results without any luck
  also.
   In additional to restricting the results, I am also looking to
  influence
   the
   scores similar to the way boost query (bq) works in the
   DisMaxRequestHandler.
  
   I think Solr's MoreLikeThis depends on Lucene's contrib queries
   MoreLikeThis, or at least it used to.  Has anyone looked into
 enhancing
   Solrs' MoreLikeThis to support bq and restricting mlt results?
  
   Bill
  
   On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
  
  
   I could not find any support from
   http://wiki.apache.org/solr/MoreLikeThison
   how to restrict MLT results to certain subsets. I passed along a fq
   parameter and it is ignored. Since we can not incorporate the
 filters
  in
   the
   query itself which is used to retrieve the target for similarity
   comparison,
   it appears there is no way to filter MLT results. BTW. I am using
 Solr
   1.3.
   Please let me know if there is way (other than hacking the source
  code)
   to
   do this. Thanks!
   --
   View this message in context:
  
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24377360.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a
configuration-based approach to index different types of datasources
including relational databases and XML files. In particular have a look at
the XpathEntityProcessor (
http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d98ef0120671db5762e137e63f9d2)
which allows you to use xpath syntax to map xml data to index fields.

-Jay


On Tue, Jul 7, 2009 at 7:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote:

 Hello.

 I'm a new user of Solr, I already used Lucene to index files and search.
 But my programme was too slow, it's why I was looking for another solution,
 and I thought I found it.

 I said I thought because I don't know if it's possible to use solar with
 this kind of XML files.

  lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0
 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd;
 general
 identifier
 catalogSTRING HERE/catalog
 entry
 STRING HERE
 /entry
 /identifier
 title
 string language=fr
 STRING HERE
 /string
 /title
 languagefr/language
 description
 string language=fr
 STRING HERE
 /string
 /description
 /general
 lifeCycle
 status
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /status
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 /contribute
 /lifeCycle
 metaMetadata
 identifier
 catalogSTRING HERE/catalog
 entrySTRING HERE/entry
 /identifier
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 entitySTRING HERE/entity
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 metadataSchemaSTRING HERE/metadataSchema
 languageSTRING HERE/language
 /metaMetadata
 technical
 locationSTRING HERE
 /location
 /technical
 educational
 intendedEndUserRole
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /intendedEndUserRole
 context
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /context
 typicalAgeRange
 string language=frSTRING HERE/string
 /typicalAgeRange
 description
 string language=frSTRING HERE/string
 /description
 description
 string language=frSTRING HERE/string
 /description
 languageSTRING HERE/language
 /educational
 annotation
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /annotation
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE /string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 /lom

 I don't know how I can use this kind of file with Solr because the XML
 example are this one.

  add
  doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=manuApache Software Foundation/field
  field name=catsoftware/field
  field name=catsearch/field
  field name=featuresAdvanced Full-Text Search Capabilities using
 Lucene/field
  field name=featuresOptimized for High Volume Web Traffic/field
  field name=featuresStandards Based Open Interfaces - XML and
 HTTP/field
  field name=featuresComprehensive HTML Administration
 Interfaces/field
  field name=featuresScalability - Efficient Replication to other Solr
 Search Servers/field
  field name=featuresFlexible and Adaptable with XML configuration and
 Schema/field
  field name=featuresGood unicode support: h#xE9;llo (hello with an
 accent over the e)/field
  field name=price0/field
 field name=popularity10/field
 field name=inStocktrue/field
 field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
 /doc
 /add

 I understood Solr need this kind of architecture, by Architecture I mean
 field + name=keywordValue/field
 or as you can see I can't use this kind of architecture because I'm not
 allow to change my XML files.

 I'm looking forward to read you.

 Mathieu Saeli
 --
 Saeli Mathieu.



Solr Set Up

2009-07-07 Thread G T
Hi,

I was interested in creating a test environment where i can

make use of solr/ lucene .My objective is to be able to test

various features of solr .(replication , performance, indexing , searching
and so on)

I wanted someone to give me a start on above.I am well versed

with lucene/solr basics.

Gaurav


How to get various records in the result set

2009-07-07 Thread fei dong
Hi buddy,

I am working on a music search project and I have a special requirement
about the ranking when querying the artist name.

Ex: When I query the artist ne yo, there are 500results and maybe 100 song
names are repeated. So the ideal thing is to let users get more different
songs in on page and the results have lyrics must be shown in the front. My
current solr query is:

?q=ne+yoqf=artistdefType=dismaxsort=lyric%20desc,links%20descstart=0rows=20indent=on

then the results will shows same song names together because those
records always get the same score.
How to implement that effect? Thxs.


RE: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Francis Yakin

I did try:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

It doesn't work

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 4:59 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using http 
connection?

Look at the error - it's bash (your command line shell) complaining.
The '' terminates one command and puts it in the background.
Surrounding the command with quotes will get you one step closer:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

-Yonik
http://www.lucidimagination.com



On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote:

 Ok, I have a CSV file(called it test.csv) from database.

 When I tried to upload this file to solr using this cmd, I got 
 stream.contentType=text/plain: No such file or directory error

 curl 
 http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8

 -bash: stream.contentType=text/plain: No such file or directory
  undefined field cat

 What did I do wrong?

 Francis

 -Original Message-
 From: Norberto Meijome [mailto:numard...@gmail.com]
 Sent: Monday, July 06, 2009 11:01 AM
 To: Francis Yakin
 Cc: solr-user@lucene.apache.org
 Subject: Re: Is there any other way to load the index beside using http 
 connection?

 On Mon, 6 Jul 2009 09:56:03 -0700
 Francis Yakin fya...@liquid.com wrote:

  Norberto,

 Thanks, I think my questions is:

 why not generate your SQL output directly into your oracle server as a file

 What type of file is this?



 a file in a format that you can then import into SOLR.

 _
 {Beto|Norberto|Numard} Meijome

 Gravity cannot be blamed for people falling in love.
  Albert Einstein

 I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.



RE: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Francis Yakin
 With
curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

No errors now.

But , how can  I verify if the update happening?

Thanks

Francis

-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: Tuesday, July 07, 2009 10:37 AM
To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com'
Cc: Norberto Meijome
Subject: RE: Is there any other way to load the index beside using http 
connection?


I did try:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

It doesn't work

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 4:59 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using http 
connection?

Look at the error - it's bash (your command line shell) complaining.
The '' terminates one command and puts it in the background.
Surrounding the command with quotes will get you one step closer:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

-Yonik
http://www.lucidimagination.com



On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote:

 Ok, I have a CSV file(called it test.csv) from database.

 When I tried to upload this file to solr using this cmd, I got 
 stream.contentType=text/plain: No such file or directory error

 curl 
 http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8

 -bash: stream.contentType=text/plain: No such file or directory
  undefined field cat

 What did I do wrong?

 Francis

 -Original Message-
 From: Norberto Meijome [mailto:numard...@gmail.com]
 Sent: Monday, July 06, 2009 11:01 AM
 To: Francis Yakin
 Cc: solr-user@lucene.apache.org
 Subject: Re: Is there any other way to load the index beside using http 
 connection?

 On Mon, 6 Jul 2009 09:56:03 -0700
 Francis Yakin fya...@liquid.com wrote:

  Norberto,

 Thanks, I think my questions is:

 why not generate your SQL output directly into your oracle server as a file

 What type of file is this?



 a file in a format that you can then import into SOLR.

 _
 {Beto|Norberto|Numard} Meijome

 Gravity cannot be blamed for people falling in love.
  Albert Einstein

 I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.



Re: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Yonik Seeley
The double quotes around the ampersand don't belong there.
I think that UTF8 should also be the default, so the following should also work:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv'

-Yonik
http://www.lucidimagination.com

On Tue, Jul 7, 2009 at 1:37 PM, Francis Yakinfya...@liquid.com wrote:

 I did try:

 curl 
 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

 It doesn't work

 Francis

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Tuesday, July 07, 2009 4:59 AM
 To: solr-user@lucene.apache.org
 Cc: Norberto Meijome
 Subject: Re: Is there any other way to load the index beside using http 
 connection?

 Look at the error - it's bash (your command line shell) complaining.
 The '' terminates one command and puts it in the background.
 Surrounding the command with quotes will get you one step closer:

 curl 
 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

 -Yonik
 http://www.lucidimagination.com


RE: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Francis Yakin
 yeah, It works now.

How can I verify if the new CSV file get uploaded?

Thanks

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 10:49 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using http 
connection?

The double quotes around the ampersand don't belong there.
I think that UTF8 should also be the default, so the following should also work:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv'

-Yonik
http://www.lucidimagination.com

On Tue, Jul 7, 2009 at 1:37 PM, Francis Yakinfya...@liquid.com wrote:

 I did try:

 curl 
 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

 It doesn't work

 Francis

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Tuesday, July 07, 2009 4:59 AM
 To: solr-user@lucene.apache.org
 Cc: Norberto Meijome
 Subject: Re: Is there any other way to load the index beside using http 
 connection?

 Look at the error - it's bash (your command line shell) complaining.
 The '' terminates one command and puts it in the background.
 Surrounding the command with quotes will get you one step closer:

 curl 
 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8'

 -Yonik
 http://www.lucidimagination.com


Re: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Yonik Seeley
On Tue, Jul 7, 2009 at 1:50 PM, Francis Yakinfya...@liquid.com wrote:
  yeah, It works now.

 How can I verify if the new CSV file get uploaded?

point your browser at
http://localhost:8983/solr/admin/stats.jsp

Check out the UPDATE HANDLERS section

-Yonik
http://www.lucidimagination.com


Re: reindexed data on master not replicated to slave

2009-07-07 Thread solr jay
It seemed that the patch fixed the symptom, but not the problem itself.

Now the log messages looks good. After one download and installed the index,
it printed out

*Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.*

but the files inside index directory did not change. Both index.properties
and replication.properties were updated though.


Just a couple of files:

from master instance:

-rw-r--r--  1 worun  wheel 181 Jul  7 09:28 _6.fdt
-rw-r--r--  1 worun  wheel  12 Jul  7 09:28 _6.fdx
-rw-r--r--  1 worun  wheel 131 Jul  7 09:28 _6.fnm
-rw-r--r--  1 worun  wheel  27 Jul  7 09:28 _6.frq
-rw-r--r--  1 worun  wheel  11 Jul  7 09:28 _6.nrm


from slave instance:

-rw-r--r--  1 jianhanguo  admin  70 Jul  6 18:55 _14_5.del
-rw-r--r--  1 jianhanguo  admin4016 Jul  6 18:55 _15.fdt
-rw-r--r--  1 jianhanguo  admin 268 Jul  6 18:55 _15.fdx
-rw-r--r--  1 jianhanguo  admin 131 Jul  6 18:55 _15.fnm
-rw-r--r--  1 jianhanguo  admin 726 Jul  6 18:55 _15.frq


Thanks,

J

2009/7/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Jay ,
 I am opening an issue SOLR-1264
 https://issues.apache.org/jira/browse/SOLR-1264

 I have attached a patch as well . I guess that is the fix. could you
 please confirm that.


 On Tue, Jul 7, 2009 at 12:59 AM, solr jaysolr...@gmail.com wrote:
  It looks that the problem is here or before that in
  SnapPuller.fetchLatestIndex():
 
 
terminateAndWaitFsyncService();
LOG.info(Conf files are not downloaded or are in sync);
if (isSnapNeeded) {
  modifyIndexProps(tmpIndexDir.getName());
} else {
  successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);
}
if (successfulInstall) {
  logReplicationTimeAndConfFiles(modifiedConfFiles);
  doCommit();
}
 
 
  Debugged into the place, and noticed that isSnapNeeded is true and
 therefore
 
  modifyIndexProps(tmpIndexDir.getName());
 
  executed, but from the function name it looks that installing index
 actually
  happens in
 
  successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);
 
 
  The function returns false, but the caller (doSnapPull) never checked the
  return value.
 
 
  Thanks,
 
  J
 
 
  On Mon, Jul 6, 2009 at 8:02 AM, solr jay solr...@gmail.com wrote:
 
  There is only one index directory: index/
 
  Here is the content of index.properties
 
  #index properties
  #Fri Jul 03 14:17:12 PDT 2009
  index=index.20090703021705
 
 
  Thanks,
 
  J
 
  2009/7/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  BTW , how many index dirs are there in the data dir ? what is there in
  the datadir/index.properties ?
 
  On Sat, Jul 4, 2009 at 12:15 AM, solr jaysolr...@gmail.com wrote:
  
  
   I tried it with the latest nightly build and got the same result.
  
   Actually that was the symptom and it made me looking at the index
   directory.
   The same log messages repeated again and again, never end.
  
  
  
   2009/7/2 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
  
   jay , I see updating index properties... twice
  
  
  
   this should happen rarely. in your case it should have happened only
   once. because you cleaned up the master only once
  
  
   On Fri, Jul 3, 2009 at 6:09 AM, Otis
   Gospodneticotis_gospodne...@yahoo.com wrote:
   
Jay,
   
You didn't mention which version of Solr you are using.  It looks
like
some trunk or nightly version.  Maybe you can try the latest
nightly?
   
 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
   
- Original Message 
From: solr jay solr...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, July 2, 2009 9:14:48 PM
Subject: reindexed data on master not replicated to slave
   
Hi,
   
When index data were corrupted on master instance, I wanted to
 wipe
out
all
the index data and re-index everything. I was hoping the newly
created
index
data would be replicated to slaves, but it wasn't.
   
Here are the steps I performed:
   
1. stop master
2. delete the directory 'index'
3. start master
4. disable replication on master
5. index all data from scratch
6. enable replication on master
   
It seemed from log file that the slave instances discovered that
new
index
are available and claimed that new index installed, and then
 trying
to
update index properties, but looking into the index directory on
slaves, you
will find that no index data files were updated or added, plus
slaves
keep
trying to get new index. Here are some from slave's log file:
   
Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
fetchLatestIndex
INFO: Starting replication process
Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller
fetchLatestIndex
INFO: Number 

Re: Solr slave Heap space error and index size issue

2009-07-07 Thread Chris Hostetter

: 5-6 days after fresh index index size suddenly increased (no optimization in
: between) by 150GB and then query takes long time and java heap error comes.
: I run optimize in this index Its takes long time and result it increase
: index size more more then 200GB and it didn't show about optimize completed.
: merge factor is default as given in solr build.

did you check your logs?  this smells like maybe a failure during commit 
or optimize (OOM maybe?) that resulted in old files not being cleaned up 
on disk ... particularly you didn't show about optimize completed. 
comment.

There is a CheckIndex tool that you can use (google for details in 
lucene-java mailing list) which *should* tell you if there are extra 
segments (i don't remember the details to be certain).




-Hoss



Re: Can´t use wildcard * on alphanumeric values?

2009-07-07 Thread Shalin Shekhar Mangar
On Tue, Jul 7, 2009 at 6:45 PM, gateway0 reiterwo...@yahoo.de wrote:


 Thank you, that was it.

 Why is the preserveOriginal=1 option nowhere documented?


A simple case of oversight :)

I've added a note on preserveOriginal and splitOnNumerics (another omission)
to the wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-- 
Regards,
Shalin Shekhar Mangar.


Re: how to shuffle the result while follow some priority rules at the same time

2009-07-07 Thread Chris Hostetter

: I want to implement that effect that the results had better differ from each
: other in one page, but I want to show some results first like those contains
: more attributes.

there is a RandomSortField that you can use as a tie breaker when all 
other fields are equal.  info baout using that can be found in the example 
schema.xml

you could also test drive the FieldCollapsing patch (in Jira, not yet 
committed) which would let you collapse the results based on a common 
field name (ie: if the song title name was identical)

: pages. My current solr query is:
: 
: 
?q=ne+yoqf=artistdefType=dismaxsort=lyric%20desc,links%20descstart=0rows=20indent=on
: So the results will shows some same song names together cause their
: scores are totally the same.
: How to modify to support random, hash effect?

you aren't using score in your sort at all -- so score isn't influencing 
your result order at all.  

assuming lyric is a boolean indicating you have lyrics, you might wnat 
something like sort=lyric+desc,+score+desc,+links+desc ... so it will 
make sure things with lyrics appear first, but all songs with lyrics will 
be in score order; if and only if two docs have identicle scores (and both 
have lyrics) will it then do a secondary sort on links

: BTW: I find the sorting with multi conditions does not work well. I
: want to sort the second attribute
: (links desc)based on the first condition. ( lyric desc) . The results
: with lyric shows really in
: the front, but the links attribute seems not in order.

you haven't explained what links is so it's hard to guess what might be 
happening here.  if you give contrete examples (ie: show us your schema, 
show us a real query, show us real results) then people might be able to 
help you.



-Hoss



Re: reindexed data on master not replicated to slave

2009-07-07 Thread Shalin Shekhar Mangar
On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

 It seemed that the patch fixed the symptom, but not the problem itself.

 Now the log messages looks good. After one download and installed the
 index,
 it printed out

 *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
 INFO: Slave in sync with master.*

 but the files inside index directory did not change. Both index.properties
 and replication.properties were updated though.


Note that in this case, Solr would have created a new index directory. Are
you comparing the files on the slave in the new index directory? You can get
the new index directory's name from index.properties.

-- 
Regards,
Shalin Shekhar Mangar.


Re: reindexed data on master not replicated to slave

2009-07-07 Thread solr jay
I see. So I tried it again. Now index.properties has

#index properties
#Tue Jul 07 12:13:49 PDT 2009
index=index.20090707121349

but there is no such directory index.20090707121349 under the data
directory.

Thanks,

J

On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

  It seemed that the patch fixed the symptom, but not the problem itself.
 
  Now the log messages looks good. After one download and installed the
  index,
  it printed out
 
  *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  INFO: Slave in sync with master.*
 
  but the files inside index directory did not change. Both
 index.properties
  and replication.properties were updated though.
 

 Note that in this case, Solr would have created a new index directory. Are
 you comparing the files on the slave in the new index directory? You can
 get
 the new index directory's name from index.properties.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: facets and stopwords

2009-07-07 Thread Chris Hostetter

: http://projecte01.development.barcelonamedia.org/fonetic/
: you will see a Top Words list (in Spanish and stemmed) in the list there
: is the word si which is in  20649 documents.
: If you click at this word, the system will perform the query 
:   (x) content:si, with no answers at all
: The same for la it is in 17881 documents, but the query  content:la will
: give no answers at all
...
: To see what's going on on the index I have tested with the analyzer
: http://projecte01.development.barcelonamedia.org/solr/admin/analysis.jsp
...
: las cosas que si no pasan la proxima vez si que no veràs

but are you sure that example would actually cause a problem?
i suspect if you index thta exact sentence as is you wouldn't see the 
facet count for si or que increase at all.

If you do a query for {!raw field=content}que you bypass the query 
parsers (which is respecting your stopwords file) and see all docs that 
contain the raw term que in the content field.

if you look at some of the docs that match, and paste their content field 
into the analysis tool, i think you'll see that the problem comes from 
using the whitespace tokenizer, and is masked by using the WDF 
after the stop filter ... things like Que? are getting ignored by the 
stopfilter, but ultimately winding up in your index as que


-Hoss


Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
I'm sorry I almost finish my script to format my xml in Solr's xml.
I'll give it to you later, I think that can help some people like me in the
future :)

I just need to formate my output text and everything will be fine :)

Cheers for your help guys ;)

On Tue, Jul 7, 2009 at 7:06 PM, Jay Hill jayallenh...@gmail.com wrote:

 Mathieu, have a look at Solr's DataImportHandler. It provides a
 configuration-based approach to index different types of datasources
 including relational databases and XML files. In particular have a look at
 the XpathEntityProcessor (

 http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d98ef0120671db5762e137e63f9d2
 )
 which allows you to use xpath syntax to map xml data to index fields.

 -Jay


 On Tue, Jul 7, 2009 at 7:25 AM, Saeli Mathieu saeli.math...@gmail.com
 wrote:

  Hello.
 
  I'm a new user of Solr, I already used Lucene to index files and search.
  But my programme was too slow, it's why I was looking for another
 solution,
  and I thought I found it.
 
  I said I thought because I don't know if it's possible to use solar with
  this kind of XML files.
 
   lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0
  http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd;
  general
  identifier
  catalogSTRING HERE/catalog
  entry
  STRING HERE
  /entry
  /identifier
  title
  string language=fr
  STRING HERE
  /string
  /title
  languagefr/language
  description
  string language=fr
  STRING HERE
  /string
  /description
  /general
  lifeCycle
  status
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /status
  contribute
  role
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /role
  entitySTRING HERE
  /entity
  /contribute
  /lifeCycle
  metaMetadata
  identifier
  catalogSTRING HERE/catalog
  entrySTRING HERE/entry
  /identifier
  contribute
  role
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /role
  entitySTRING HERE
  /entity
  date
  dateTimeSTRING HERE/dateTime
  /date
  /contribute
  contribute
  role
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /role
  entitySTRING HERE
  /entity
  entitySTRING HERE/entity
  entitySTRING HERE
  /entity
  date
  dateTimeSTRING HERE/dateTime
  /date
  /contribute
  metadataSchemaSTRING HERE/metadataSchema
  languageSTRING HERE/language
  /metaMetadata
  technical
  locationSTRING HERE
  /location
  /technical
  educational
  intendedEndUserRole
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /intendedEndUserRole
  context
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /context
  typicalAgeRange
  string language=frSTRING HERE/string
  /typicalAgeRange
  description
  string language=frSTRING HERE/string
  /description
  description
  string language=frSTRING HERE/string
  /description
  languageSTRING HERE/language
  /educational
  annotation
  entitySTRING HERE
  /entity
  date
  dateTimeSTRING HERE/dateTime
  /date
  /annotation
  classification
  purpose
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /purpose
  /classification
  classification
  purpose
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /purpose
  taxonPath
  source
  string language=frSTRING HERE/string
  /source
  taxon
  idSTRING HERE/id
  entry
  string language=frSTRING HERE/string
  /entry
  /taxon
  /taxonPath
  /classification
  classification
  purpose
  sourceSTRING HERE/source
  valueSTRING HERE/value
  /purpose
  taxonPath
  source
  string language=frSTRING HERE /string
  /source
  taxon
  idSTRING HERE/id
  entry
  string language=frSTRING HERE/string
  /entry
  /taxon
  /taxonPath
  taxonPath
  source
  string language=frSTRING HERE/string
  /source
  taxon
  idSTRING HERE/id
  entry
  string language=frSTRING HERE/string
  /entry
  /taxon
  /taxonPath
  /classification
  /lom
 
  I don't know how I can use this kind of file with Solr because the XML
  example are this one.
 
   add
   doc
   field name=idSOLR1000/field
   field name=nameSolr, the Enterprise Search Server/field
   field name=manuApache Software Foundation/field
   field name=catsoftware/field
   field name=catsearch/field
   field name=featuresAdvanced Full-Text Search Capabilities using
  Lucene/field
   field name=featuresOptimized for High Volume Web Traffic/field
   field name=featuresStandards Based Open Interfaces - XML and
  HTTP/field
   field name=featuresComprehensive HTML Administration
  Interfaces/field
   field name=featuresScalability - Efficient Replication to other Solr
  Search Servers/field
   field name=featuresFlexible and Adaptable with XML configuration and
  Schema/field
   field name=featuresGood unicode support: h#xE9;llo (hello with an
  accent over the e)/field
   field name=price0/field
  field name=popularity10/field
  field name=inStocktrue/field
  field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
  /doc
  /add
 
  I understood Solr need this kind of architecture, by Architecture I mean
  field + name=keywordValue/field
  or as you can see I can't use this kind of architecture because I'm 

Re: How to get various records in the result set

2009-07-07 Thread Chris Hostetter

duplicate post?
http://www.nabble.com/how-to-shuffle-the-result-while-follow-some-priority-rules-at-the--same-time-to24282025.html#a24282025

FYI: reposting the same question twice doesn't tend to get responses 
faster, it just increases the total volume of mail and slows down 
everyones ability to read/reply to messages. 

what can help get a response: Replying to your own question with 
additional details like configs, concrete examples, debugging output, log 
messages, things you've tried to solve hte problem, etc...



-Hoss



Re: reindexed data on master not replicated to slave

2009-07-07 Thread solr jay
Ok, Here is the problem. In the function, the two directories tmpIndexDir
and indexDir are the same (in this case only?), and then at the end of the
function, the directory tmpIndexDir is deleted, which deletes the new index
directory.


  } finally {
delTree(tmpIndexDir);
  }


On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote:

 I see. So I tried it again. Now index.properties has

 #index properties
 #Tue Jul 07 12:13:49 PDT 2009
 index=index.20090707121349

 but there is no such directory index.20090707121349 under the data
 directory.

 Thanks,

 J


 On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

  It seemed that the patch fixed the symptom, but not the problem itself.
 
  Now the log messages looks good. After one download and installed the
  index,
  it printed out
 
  *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  INFO: Slave in sync with master.*
 
  but the files inside index directory did not change. Both
 index.properties
  and replication.properties were updated though.
 

 Note that in this case, Solr would have created a new index directory. Are
 you comparing the files on the slave in the new index directory? You can
 get
 the new index directory's name from index.properties.

 --
 Regards,
 Shalin Shekhar Mangar.





Re: reindexed data on master not replicated to slave

2009-07-07 Thread solr jay
In fact, I saw the directory was created and then deleted.

On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote:

 Ok, Here is the problem. In the function, the two directories tmpIndexDir
 and indexDir are the same (in this case only?), and then at the end of the
 function, the directory tmpIndexDir is deleted, which deletes the new index
 directory.


   } finally {
 delTree(tmpIndexDir);

   }


 On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote:

 I see. So I tried it again. Now index.properties has

 #index properties
 #Tue Jul 07 12:13:49 PDT 2009
 index=index.20090707121349

 but there is no such directory index.20090707121349 under the data
 directory.

 Thanks,

 J


 On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

  It seemed that the patch fixed the symptom, but not the problem itself.
 
  Now the log messages looks good. After one download and installed the
  index,
  it printed out
 
  *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  INFO: Slave in sync with master.*
 
  but the files inside index directory did not change. Both
 index.properties
  and replication.properties were updated though.
 

 Note that in this case, Solr would have created a new index directory.
 Are
 you comparing the files on the slave in the new index directory? You can
 get
 the new index directory's name from index.properties.

 --
 Regards,
 Shalin Shekhar Mangar.






Re: Filtering MoreLikeThis results

2009-07-07 Thread Yao Ge

The answer to my owner question:
  ...
  requestHandler name=mlt class=solr.MoreLikeThisHandler
lst name=defaults/
  /requestHandler
  ...

would work.
-Yao


Yao Ge wrote:
 
 I am not sure about the parameters for MLT the requestHandler plugin. Can
 one of you share the solrconfig.xml entry for MLT? Thanks in advance.
 -Yao
 
 
 Bill Au wrote:
 
 I have been using the StandardRequestHandler (ie /solr/select).  fq does
 work with the MoreLikeThisHandler.  I will switch to use that.  Thanks.
 
 Bill
 
 On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 At least in trunk, if you request for:
 http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO
 200]
 It will filter the MoreLikeThis results


 Bill Au wrote:
 
  I think fq only works on the main response, not the mlt matches.  I
 found
  a
  couple of releated jira:
 
  http://issues.apache.org/jira/browse/SOLR-295
  http://issues.apache.org/jira/browse/SOLR-281
 
  If I am reading them correctly, I should be able to use DIsMax and
  MoreLikeThis together.  I will give that a try and report back.
 
  Bill
 
 
  On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
  marc.sturl...@gmail.comwrote:
 
 
  Using MoreLikeThisHandler you can use fq to filter your results. As
 far
  as
  I
  know bq are not allowed.
 
 
  Bill Au wrote:
  
   I have been trying to restrict MoreLikeThis results without any
 luck
  also.
   In additional to restricting the results, I am also looking to
  influence
   the
   scores similar to the way boost query (bq) works in the
   DisMaxRequestHandler.
  
   I think Solr's MoreLikeThis depends on Lucene's contrib queries
   MoreLikeThis, or at least it used to.  Has anyone looked into
 enhancing
   Solrs' MoreLikeThis to support bq and restricting mlt results?
  
   Bill
  
   On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote:
  
  
   I could not find any support from
   http://wiki.apache.org/solr/MoreLikeThison
   how to restrict MLT results to certain subsets. I passed along a
 fq
   parameter and it is ignored. Since we can not incorporate the
 filters
  in
   the
   query itself which is used to retrieve the target for similarity
   comparison,
   it appears there is no way to filter MLT results. BTW. I am using
 Solr
   1.3.
   Please let me know if there is way (other than hacking the source
  code)
   to
   do this. Thanks!
   --
   View this message in context:
  
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24380408.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Faceting with MoreLikeThis

2009-07-07 Thread Yao Ge

Faceting on MLT request the use of MoreLikeThisHandler. The standard request
handler, while provide support to MLT via a search component, does not
return facets on MLT results. To enable MLT handler, add an entry like below
to your solrconfig.xml

  requestHandler name=mlt class=solr.MoreLikeThisHandler
lst name=defaults/
  /requestHandler

The query parameters syntax for faceting remains the same as standard
request handler.

-Yao


Yao Ge wrote:
 
 Does Solr support faceting on MoreLikeThis search results?
 

-- 
View this message in context: 
http://www.nabble.com/Faceting-with-MoreLikeThis-tp24356166p24380459.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
And here it's my code :)

If you need some explanation feel free to ask :)
You can test it on the first test file I gave you when I open the thread.

At the moment that works only on one file, I have to change it a bit to make
it works on repertory with lots of xml files,

See you later guys :-)

$repertory = 0.xml;
$BaseObject = simplexml_load_file($repertory);
$Prefix = $BaseObject-getName();
$Final =  recu($BaseObject);
format($Final);

function OpenFile()
{
  if (!file_exists(FinalParsin.xml))
$fd = fopen(FinalParsing.xml, w+);
  else
$fd = fopen(FinalParsing.xml, x);
  if ($fd  0)
  {
echo Fatal Error: Couldn't Create and open the tempory file.\n;
exit -1;
  }
  return $fd;
}

function Xfwrite($fd, $String)
{
  if (!fwrite($fd, $String))
  {
echo Fatal Error: Couldn't write in the tompory file.\n;
exit -1;
  }
  return ;
}

function format($String)
{
  $fd = OpenFile();
  Xfwrite($fd, add\ndoc\n);
  $String = split(\n, $String);
  for ($i = 0; $String[$i]; ++$i)
  {
$Parsing = split(=, $String[$i]);
Xfwrite($fd, \t.'field
name='.$Parsing[0].''.$Parsing[1].'field'.\n);
  }
  Xfwrite($fd, /doc\n/add);
  fclose($fd);
}

function recu($Object, $Prefix = null)
{
  if ($Prefix === null)
$Prefix = $Object-getName();
  else
$Prefix .= '::'.$Object-getName();
  if (count($Object-Children())  1)
return $Prefix.'='.str_replace(\n,  , $Object).\n;
  foreach($Object-Children() as $Child)
$Save .= recu($Child, $Prefix);
  return $Save;
}

-- 
Saeli Mathieu.


Re: Can't limit return fields in custom request handler

2009-07-07 Thread Chris Hostetter

: But I have a problem like this;  when i call
: 
http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id,
: itemTitle
: i'm getiing all fields instead of only id and itemTitle.

Your custom handler is responsible for checking the fl and setting what 
you want the response fields to be on the response object.

SolrPluginUtils.setReturnFields can be used if you want this to be done in 
the normal way.

: Also i'm gettting no result when i give none null filter parameter in
: getDocListAndSet(...).
...
: DocListAndSet results = req.getSearcher().getDocListAndSet(q,
: (Query)null, (Sort)null, solrParams.getInt(start),
: solrParams.getInt(limit));

...that should work.  What does your query look like?  what are you 
passing for the start and limit params (is it possible you are getting 
results, but limit=0 so there aren't any results on the current page of 
pagination?) what does the debug output look like?


-Hoss



RE: Is there any other way to load the index beside using http connection?

2009-07-07 Thread Francis Yakin
 Norberto,

You said last week:

why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql - SOLR server as file, etc..)

I think this is the best solution that we are going to without changing too 
much on our setup.

Like said we have file name test.xml which come from SQL output , we put it 
locally on the solr server under /opt/test.xml

So, I need to execute the commands from solr system to add and update this to 
the solr data/indexes.

What commands do I have to use, for example the xml file named /opt/test.xml ?


Thanks

Francis


-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Sunday, July 05, 2009 3:57 AM
To: Francis Yakin
Cc: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using http 
connection?

On Thu, 2 Jul 2009 11:02:28 -0700
Francis Yakin fya...@liquid.com wrote:

 Norberto, Thanks for your input.

 What do you mean with Have you tried connecting to  SOLR over HTTP from
 localhost, therefore avoiding any firewall issues and network latency ? it
 should work a LOT faster than from a remote site. ?


 Here are how our servers lay out:

 1) Database ( Oracle ) is running on separate machine
 2) Solr master is running on separate machine by itself
 3) 6 solr slaves ( these 6 pulll the index from master using rsync)

 We have a SQL(Oracle) script to post the data/index from Oracle Database
 machine to Solr Master over http. We wrote those script(Someone in Oracle
 Database administrator write it).

You said in your other email you are having issues with slow transfers between
1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and
3) is irrelevant to this part.

My question (what you quoted above) relates to the point you made about it
being slow ( WHY is it slow?), and issues with opening so many connections
through firewall. so, I'll rephrase my question (see below...)

[]

 We can not do localhost since it's solr is not running on Oracle machine.

why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql - SOLR server as file, etc..)

Any speed issues that are rooted in the fact that you are posting via
HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler
approach without changing too much of your current setup.


 Another alternative that we think of is to transform XML into CSV and
 import/export it.

 How about if LUSQL, some mentioned about this? Is this apps free(open source)
 application? Do you have any experience with this apps?

Not i, sorry.

Have you looked into DIH? It's designed for this kind of work.

B
_
{Beto|Norberto|Numard} Meijome

Great spirits have often encountered violent opposition from mediocre minds.
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


about defaultSearchField

2009-07-07 Thread Yang Lin
Hi,
I have some problems.
For my solr progame, I want to type only the Query String and get all field
result that includ the Query String. But now I can't get any result without
specified field. For example, query with tina get nothing, but
Sentence:tina could.

I hava adjusted the *schema.xml* like this:

fields
field name=CategoryNamePolarity type=text indexed=true
 stored=true multiValued=true/
field name=CategoryNameStrenth type=text indexed=true
 stored=true multiValued=true/
field name=CategoryNameSubjectivity type=text indexed=true
 stored=true multiValued=true/
field name=Sentence type=text indexed=true stored=true
 multiValued=true/

field name=allText type=text indexed=true stored=true
 multiValued=true/
 /fields

 uniqueKey required=falseSentence/uniqueKey

  !-- field for the QueryParser to use when an explicit fieldname is absent
 --
  defaultSearchFieldallText/defaultSearchField

  !-- SolrQueryParser configuration: defaultOperator=AND|OR --
  solrQueryParser defaultOperator=OR/

 copyfield source=CategoryNamePolarity dest=allText/
 copyfield source=CategoryNameStrenth dest=allText/
 copyfield source=CategoryNameSubjectivity dest=allText/
 copyfield source=Sentence dest=allText/


I think the problem is in defaultSearchField, but I don't know how to fix
it. Could anyone help me?

Thanks
Yang


Re: reindexed data on master not replicated to slave

2009-07-07 Thread solr jay
I guess in this case it doesn't matter whether the two directories
tmpIndexDir and indexDir are the same or not. It looks that the index
directory is switched to tmpIndexDir and then it is deleted inside
finally.

On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote:

 In fact, I saw the directory was created and then deleted.


 On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote:

 Ok, Here is the problem. In the function, the two directories tmpIndexDir
 and indexDir are the same (in this case only?), and then at the end of the
 function, the directory tmpIndexDir is deleted, which deletes the new index
 directory.


   } finally {
 delTree(tmpIndexDir);

   }


 On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote:

 I see. So I tried it again. Now index.properties has

 #index properties
 #Tue Jul 07 12:13:49 PDT 2009
 index=index.20090707121349

 but there is no such directory index.20090707121349 under the data
 directory.

 Thanks,

 J


 On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

  It seemed that the patch fixed the symptom, but not the problem
 itself.
 
  Now the log messages looks good. After one download and installed the
  index,
  it printed out
 
  *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  INFO: Slave in sync with master.*
 
  but the files inside index directory did not change. Both
 index.properties
  and replication.properties were updated though.
 

 Note that in this case, Solr would have created a new index directory.
 Are
 you comparing the files on the slave in the new index directory? You can
 get
 the new index directory's name from index.properties.

 --
 Regards,
 Shalin Shekhar Mangar.







-- 
J


Re: Stopwords when facetting

2009-07-07 Thread Chris Hostetter

: When indexing or querying text, i'm using the solr.StopFilterFactory ; it 
seems to works just fine...
: 
: But I want to use the text field as a facet, and get all the commonly 
: used words in a set of results, without the stopwords. As far as I 
: tried, I always get stopwords, and numerical terms, that pollute my 
: facets results. How can I perform this ?

perhaps you have the same problem as described here...

http://www.nabble.com/facets-and-stopwords-to23952823.html#a24379679


...it's hard to be certain without any actual concrete examples (what does 
your schema.xml look like, what are you stopwords, what terms are still 
showing up in your facet list even though they are stop words, what 
documents contain those terms (the raw parser can help you find them...

q={!raw field=yourFieldName}wordYouDoNotExpect




-Hoss



Re: Preparing the ground for a real multilang index

2009-07-07 Thread Jan Høydahl

When using stemming, you have to know the query language.
For your project, perhaps you should look into switching to a  
lemmatizer instead. I believe Lucid can provide integration with a  
commercial lemmatizer. This way you can expand the document field  
itself and do not need to know the query language. You may then want  
to do a copyfield from all your text_lang - text for convenient one- 
field-to-rule-them-all search.


--
Jan Høydahl
Gründer  senior architect
Cominvent AS, Stabekk, Norway
www.cominvent.com
+20 100930908

On 3. juli. 2009, at 08.43, Michael Lackhoff wrote:


On 03.07.2009 00:49 Paul Libbrecht wrote:

[I'll try to address the other responses as well]


I believe the proper way is for the server to compute a list of
accepted languages in order of preferences.
The web-platform language (e.g. the user-setting), and the values in
the Accept-Language http header (which are from the browser or
platform).


All this is not going to help much because the main application is a
scientific search portal for books and articles with many users
searching cross-language. The most typical use case is a German user
searching multilingual. So we might even get the search multilingual,
e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for
Accept-headers or a language select field (would be left on any in
most cases). Other popular use cases are citations (in whatever
language) cut and pasted into the search field.


Then you expand your query for surfing waves (say) to:
- phrase query: surfing waves exactly (^2.0)
- two terms, no stemming: surfing waves (^1.5)
- iterate through the languages and query for stemmed variants:
  - english: surf wav ^1.0
  - german surfing wave ^0.9
  - 
- then maybe even try the phonetic analyzer (matched in a separate
field probably)


This is an even more sophisticated variant of the multiple OR I came
up with. Oh well...

I think this is a common pattern on the web where the users,  
browsers,

and servers are all somewhat multilingual.


indeed and often users are not even aware of it, especially in a
scientific context they use their native tongue and English almost
interchangably -- and they expect the search engine to cope with it.

I think the best would be to process the data according to its  
language
but don't make any assumptions about the query language and I am  
totally

lost how to get a clever schema.xml out of all this.

Thanks everyone for listening and I am still open for good suggestions
to deal with this problem!

-Michael




Re: Preparing the ground for a real multilang index

2009-07-07 Thread Benson Margulies
There is an alternative to knowing the language at query:
multiply-process for stems or lemmas of all the possible languages.
This may well be a cure much worse than the disease.

Yes, LI can sell you our lemma-production capability.

--benson margulies
basis technology




On Tue, Jul 7, 2009 at 6:50 PM, Jan Høydahlj...@cominvent.com wrote:
 When using stemming, you have to know the query language.
 For your project, perhaps you should look into switching to a lemmatizer
 instead. I believe Lucid can provide integration with a commercial
 lemmatizer. This way you can expand the document field itself and do not
 need to know the query language. You may then want to do a copyfield from
 all your text_lang - text for convenient one-field-to-rule-them-all
 search.

 --
 Jan Høydahl
 Gründer  senior architect
 Cominvent AS, Stabekk, Norway
 www.cominvent.com
 +20 100930908

 On 3. juli. 2009, at 08.43, Michael Lackhoff wrote:

 On 03.07.2009 00:49 Paul Libbrecht wrote:

 [I'll try to address the other responses as well]

 I believe the proper way is for the server to compute a list of
 accepted languages in order of preferences.
 The web-platform language (e.g. the user-setting), and the values in
 the Accept-Language http header (which are from the browser or
 platform).

 All this is not going to help much because the main application is a
 scientific search portal for books and articles with many users
 searching cross-language. The most typical use case is a German user
 searching multilingual. So we might even get the search multilingual,
 e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for
 Accept-headers or a language select field (would be left on any in
 most cases). Other popular use cases are citations (in whatever
 language) cut and pasted into the search field.

 Then you expand your query for surfing waves (say) to:
 - phrase query: surfing waves exactly (^2.0)
 - two terms, no stemming: surfing waves (^1.5)
 - iterate through the languages and query for stemmed variants:
  - english: surf wav ^1.0
  - german surfing wave ^0.9
  - 
 - then maybe even try the phonetic analyzer (matched in a separate
 field probably)

 This is an even more sophisticated variant of the multiple OR I came
 up with. Oh well...

 I think this is a common pattern on the web where the users, browsers,
 and servers are all somewhat multilingual.

 indeed and often users are not even aware of it, especially in a
 scientific context they use their native tongue and English almost
 interchangably -- and they expect the search engine to cope with it.

 I think the best would be to process the data according to its language
 but don't make any assumptions about the query language and I am totally
 lost how to get a clever schema.xml out of all this.

 Thanks everyone for listening and I am still open for good suggestions
 to deal with this problem!

 -Michael




A big question about Solr and SolrJ range query ?

2009-07-07 Thread huenzhao

Hi all:

Suppose that my index have 3 fields: title, x and y.

I know one range(10  x  100) can query liks this: 

http://localhost:8983/solr/select?q=x:[10 TO 100]fl=title

If I want to two range(10  x 100 AND 20  y  300) query like 

SQL(select title where x10 and x  100 and y  20 and y  300) 

by using Solr range query or SolrJ, but not know how to implement. Anybody
know ? Thanks

Email: enzhao...@gmail.com

-- 
View this message in context: 
http://www.nabble.com/A-big-question-about-Solr-and-SolrJ-range-query---tp24384416p24384416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: A big question about Solr and SolrJ range query ?

2009-07-07 Thread Yao Ge

use Solr's Filter Query parameter fq:
fq=x:[10 TO 100]fq=y:[20 TO 300]fl=title

-Yao

huenzhao wrote:
 
 Hi all:
 
 Suppose that my index have 3 fields: title, x and y.
 
 I know one range(10  x  100) can query liks this: 
 
 http://localhost:8983/solr/select?q=x:[10 TO 100]fl=title
 
 If I want to two range(10  x 100 AND 20  y  300) query like 
 
 SQL(select title where x10 and x  100 and y  20 and y  300) 
 
 by using Solr range query or SolrJ, but not know how to implement. Anybody
 know ? Thanks
 
 Email: enzhao...@gmail.com
 
 

-- 
View this message in context: 
http://www.nabble.com/A-big-question-about-Solr-and-SolrJ-range-query---tp24384416p24384540.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: about defaultSearchField

2009-07-07 Thread Yao Ge

Try with fl=* or fl=*,score added to your request string.
-Yao

Yang Lin-2 wrote:
 
 Hi,
 I have some problems.
 For my solr progame, I want to type only the Query String and get all
 field
 result that includ the Query String. But now I can't get any result
 without
 specified field. For example, query with tina get nothing, but
 Sentence:tina could.
 
 I hava adjusted the *schema.xml* like this:
 
 fields
field name=CategoryNamePolarity type=text indexed=true
 stored=true multiValued=true/
field name=CategoryNameStrenth type=text indexed=true
 stored=true multiValued=true/
field name=CategoryNameSubjectivity type=text indexed=true
 stored=true multiValued=true/
field name=Sentence type=text indexed=true stored=true
 multiValued=true/

field name=allText type=text indexed=true stored=true
 multiValued=true/
 /fields

 uniqueKey required=falseSentence/uniqueKey

  !-- field for the QueryParser to use when an explicit fieldname is
 absent
 --
  defaultSearchFieldallText/defaultSearchField

  !-- SolrQueryParser configuration: defaultOperator=AND|OR --
  solrQueryParser defaultOperator=OR/

 copyfield source=CategoryNamePolarity dest=allText/
 copyfield source=CategoryNameStrenth dest=allText/
 copyfield source=CategoryNameSubjectivity dest=allText/
 copyfield source=Sentence dest=allText/
 
 
 I think the problem is in defaultSearchField, but I don't know how to
 fix
 it. Could anyone help me?
 
 Thanks
 Yang
 
 

-- 
View this message in context: 
http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Query on the updation of synonym and stopword file.

2009-07-07 Thread Sagar Khetkade

I am using Solr1.3 version..
 
 Date: Wed, 8 Jul 2009 01:12:02 +0900
 From: k...@r.email.ne.jp
 To: solr-user@lucene.apache.org
 Subject: Re: Query on the updation of synonym and stopword file.
 
 Sagar,
 
  I am facing a problem here that even after the core reload and 
 re-indexing
  the documents the new updated synonym or stop words are not loaded.
  Seems so the filters are not aware that these files are updated so 
 the solution
  to me is to restart the whole container in which I have embedded
  the Solr server; it is not feasible in production.
 
 I am not a multicore user, but I can see the synonyms.txt updated
 after reloading the core (I verified it via analysis.jsp, not re-indexing),
 wothout restarting solr server. I'm using 1.4. What version are you using?
 
 Koji
 
 
 Sagar Khetkade wrote:
  Hello All,
  
  I was figuring out the issue with the synonym.txt and stopword.txt files 
  being updated on regular interval. 
  Here in my case I am updating the synonym.txt and stopword.txt files as the 
  synonym and stop word dictionary is update. I am facing a problem here that 
  even after the core reload and re-indexing the documents the new updated 
  synonym or stop words are not loaded. Seems so the filters are not aware 
  that these files are updated so the solution to me is to restart the whole 
  container in which I have embedded the Solr server; it is not feasible in 
  production.
  I came across the discussion with subject “ synonyms.txt file updated 
  frequently” in which Grant had a view to write a new logic in 
  SynonymFilterFactory which would take care of this issue. Is there any 
  possible solution to this or is this the solution.
  Thanks in advance!
  
  Regards,
  Sagar Khetkade
  
  
  _
  Missed any of the IPL matches ? Catch a recap of all the action on MSN 
  Videos
  http://msnvideos.in/iplt20/msnvideoplayer.aspx
  
 

_
More than messages–check out the rest of the Windows Live™.
http://www.microsoft.com/india/windows/windowslive/

Updating Solr index from XML files

2009-07-07 Thread Francis Yakin

I have the following curl cmd to update and doing commit to Solr ( I have 10 
xml files just for testing)

 curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @commit.txt -H 
'Content-type:text/plain; charset=utf-8'

It works so far. But I will have  3 xml files.

What's the efficient way to do these things? I can script it with for loop 
using regular shell script or perl.

I am also looking into solr.pm from this:

http://wiki.apache.org/solr/IntegratingSolr

BTW: We are using weblogic to deploy the solr.war and by default solr in 
weblogic using port 7001, but not 8983.

Thanks

Francis




Re: reindexed data on master not replicated to slave

2009-07-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
jay,
Thanks. The testcase was not enough. I have given a new patch . I
guess that should solve this

On Wed, Jul 8, 2009 at 3:48 AM, solr jaysolr...@gmail.com wrote:
 I guess in this case it doesn't matter whether the two directories
 tmpIndexDir and indexDir are the same or not. It looks that the index
 directory is switched to tmpIndexDir and then it is deleted inside
 finally.

 On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote:

 In fact, I saw the directory was created and then deleted.


 On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote:

 Ok, Here is the problem. In the function, the two directories tmpIndexDir
 and indexDir are the same (in this case only?), and then at the end of the
 function, the directory tmpIndexDir is deleted, which deletes the new index
 directory.


       } finally {
         delTree(tmpIndexDir);

       }


 On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote:

 I see. So I tried it again. Now index.properties has

 #index properties
 #Tue Jul 07 12:13:49 PDT 2009
 index=index.20090707121349

 but there is no such directory index.20090707121349 under the data
 directory.

 Thanks,

 J


 On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote:

  It seemed that the patch fixed the symptom, but not the problem
 itself.
 
  Now the log messages looks good. After one download and installed the
  index,
  it printed out
 
  *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  INFO: Slave in sync with master.*
 
  but the files inside index directory did not change. Both
 index.properties
  and replication.properties were updated though.
 

 Note that in this case, Solr would have created a new index directory.
 Are
 you comparing the files on the slave in the new index directory? You can
 get
 the new index directory's name from index.properties.

 --
 Regards,
 Shalin Shekhar Mangar.







 --
 J




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Updating Solr index from XML files

2009-07-07 Thread Otis Gospodnetic

If Perl is you choice:
http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, July 8, 2009 1:16:04 AM
 Subject: Updating Solr index from XML files
 
 
 I have the following curl cmd to update and doing commit to Solr ( I have 
 10 
 xml files just for testing)
 
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 
 'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @commit.txt -H 
 'Content-type:text/plain; charset=utf-8'
 
 It works so far. But I will have  3 xml files.
 
 What's the efficient way to do these things? I can script it with for loop 
 using 
 regular shell script or perl.
 
 I am also looking into solr.pm from this:
 
 http://wiki.apache.org/solr/IntegratingSolr
 
 BTW: We are using weblogic to deploy the solr.war and by default solr in 
 weblogic using port 7001, but not 8983.
 
 Thanks
 
 Francis