Re: Is this DIH entity forEach expression OK? ... yes

2009-02-13 Thread Fergus McMenemie
Hello,

I am having bother with forEach. I have XML source documents containing
many embedded images within mediaBlock elements. Each image has a an
associated caption. I want to implement a separate image search function
which searches the captions and brings back the associated image.

 entity name=x
dataSource=myfilereader
processor=XPathEntityProcessor
url=${jc.fileAbsolutePath}
stream=false
forEach=/record | /record/mediaBlock


 field column=vurl 
 xpath=/record/mediaBlock/mediaObject/@vurl /
 field column=imgCpation   
 xpath=/record/mediaBlock/caption  /

Is is OK to have an xpath expression within forEach which is a child 
of another of the forEach xpath expressions?

Yes. It works fine, duplicate uniqueKeys were making it appear otherwise.

But
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann

Grant Ingersoll wrote:
I believe the reason is b/c when onlyMP is false, if the word itself is 
already in the index, it short circuits out.  When onlyMP is true, it 
checks to see if there are more frequently occurring variations.
This would mean that onlyMorePopular=false isn't useful at all. If the 
word is in the index it would not find less frequent words and if it is 
not in the index onlyMorePopular=false isn't usefull since there are no 
less popular words.

So if you are right this is a bug, isn't it?

Thanks,
Marcus


abbreviation problem

2009-02-13 Thread 李学健
hi, all

to abbreviation, for example, 'US', how can i get results containing
'United States' in solr or lucene?
in solr, synonyms filter, it seems only to handle one-word to one-word.
but in abbreviation queries, words should be expanded.

any body has a goods solution to that ?

--steven.li



commit error which kill my dataimport.properties file

2009-02-13 Thread sunnyfr

Hi, 

Last night I've got an error during the importation and I don't get what
does that mean and it even kill my dataimport.properties (empty file), so
nothing was write in this file then the delta-import, started to import from
the very start I guess.

Thanks a lot for your help,
I wish you guys a lovely day,



there is the error:


2009/02/12 23:45:01 commit request to Solr at
http://books.com:8180/solr/books/update failed:
2009/02/12 23:45:01 htmlheadtitleApache Tomcat/5.5 - Error
report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - No space left on device
java.io.IOException: No space left on device at
java.io.RandomAccessFile.writeBytes(Native Method) at
java.io.RandomAccessFile.write(RandomAccessFile.java:466) at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:679)
at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at
org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:686)
at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:145) at
org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:83)
at
org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
at
org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
at
org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:373)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:562)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3803) at
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3712) at
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1752) at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1716) at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1687) at
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:214) at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:172)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:341)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619) /h1HR size=1
noshade=noshadeptype Status report/ppmessage uNo space left on
device java.io.IOException: No space left on device at
java.io.RandomAccessFile.writeBytes(Native Method) at
java.io.RandomAccessFile.write(RandomAccessFile.java:466) at

Re: commit error which kill my dataimport.properties file

2009-02-13 Thread sunnyfr

It's actually the space, sorry. 
But yes my snapshot looks huge around 3G every 20mn, so should I clean them
up more often like every 4hours?? 



sunnyfr wrote:
 
 Hi, 
 
 Last night I've got an error during the importation and I don't get what
 does that mean and it even kill my dataimport.properties (empty file), so
 nothing was write in this file then the delta-import, started to import
 from the very start I guess.
 
 Thanks a lot for your help,
 I wish you guys a lovely day,
 
 
 
 there is the error:
 
 
 2009/02/12 23:45:01 commit request to Solr at
 http://books.com:8180/solr/books/update failed:
 2009/02/12 23:45:01 htmlheadtitleApache Tomcat/5.5 - Error
 report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
 H3
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
 B
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
 P
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 500 - No space left on device
 java.io.IOException: No space left on device at
 java.io.RandomAccessFile.writeBytes(Native Method) at
 java.io.RandomAccessFile.write(RandomAccessFile.java:466) at
 org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:679)
 at
 org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
 at
 org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
 at
 org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109)
 at
 org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:686)
 at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:145) at
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:83)
 at
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
 at
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
 at
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:373)
 at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:562)
 at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3803) at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3712) at
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1752)
 at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1716) at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1687) at
 org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:214) at
 org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:172)
 at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:341)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at
 org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
 at
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:619) /h1HR size=1
 noshade=noshadeptype Status 

Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann

Shalin Shekhar Mangar wrote:

The end goal is to give spelling suggestions. Even if it gave less
frequently occurring spelling suggestions, what would you do with it?

To give you an example:
We have an index for computer games. One title is gran turismo. The 
word gran is less frequent in the index than grand. So if someone 
searches for grand turismo there will be no suggestion gran.


And to come back to my last question: There seems to be no case in which 
onlyMorePopular=false makes sense (provided Grant's assumption is 
correct). Do you see one?


Thanks,
Marcus


Re: several snapshot ...

2009-02-13 Thread sunnyfr

Hi Hoss,

Thanks a lot for your clear answer.
It's very clear.

Thanks


hossman wrote:
 
 
 : I would like to get how is a snapshot really. It's obviously a hard link
 to
 : the files.
 : But it just contain the last update ?? 
 
 the nature of lucene indexes is that files are never modified -- only 
 created, or deleted.
 
 this makes rsyncing very efficient when updates have been made to an 
 index, because only new new files exist, and only those new files need to 
 be synced.
 
 : My problem is ... Ive cronjob to commit and auto start snapshooter every
 : 5mn, which works properly.
 : And on my slaves I've cronjob every 5minute to snapshoot. But I dont
 get
 : why, it doesn't take every snapshot files ... maybe bad syncronisation,
 or
 : it's too long to install?
 
 in your case, it looks like two things are happening...  note that the 
 snapinstaller command run at 16:35:36 doesn't seem to finish until 
 16:41:37 (361 seconds later) and it encountered an error that it couldn't 
 connect to your solr port (do the scripts have hte correct host:port 
 configuration?)
 
 the second thing to notice is that everytime snapinstaller runs after 
 that, it says the most current snapshot is snapshot.20090205162502 ... 
 which means either snappuller isn't running often enough, or snapshooter 
 isn't producing snapshots as frequently as you think it is.
 
 FYI: typically people cron snappuller; snapinstaller together as a 
 single crontab entry ...
 
 http://wiki.apache.org/solr/CollectionDistribution
 
 
 : 2009/02/05 16:35:36 started by root
 : 2009/02/05 16:35:36 command: /data/solr/books/bin/snapinstaller
 : 2009/02/05 16:35:37 installing snapshot
 : /data/solr/books/data/snapshot.20090205162502
 : 2009/02/05 16:35:38 notifing Solr to open a new Searcher
 : 2009/02/05 16:40:04 started by root
 : 2009/02/05 16:40:04 command: /data/solr/books/bin/snapinstaller
 : 2009/02/05 16:40:04 latest snapshot
 : /data/solr/books/data/snapshot.20090205162502 already installed
 : 2009/02/05 16:40:04 ended (elapsed time: 0 sec)
 : 2009/02/05 16:41:37 failed to connect to Solr server
 : 2009/02/05 16:41:37 snapshot installed but Solr server has not open a
 new
 : Searcher
 : 2009/02/05 16:41:37 failed (elapsed time: 361 sec)
 : 2009/02/05 16:54:40 started by root
 : 2009/02/05 16:54:40 started by root
 : 2009/02/05 16:54:40 command: /data/solr/books/bin/snapinstaller
 : 2009/02/05 16:54:40 command: /data/solr/books/bin/snapinstaller
 : 2009/02/05 16:54:40 latest snapshot
 : /data/solr/books/data/snapshot.20090205162502 already installed
 : 2009/02/05 16:54:40 latest snapshot
 : /data/solr/books/data/snapshot.20090205162502 already installed
 : 2009/02/05 16:54:40 ended (elapsed time: 0 sec)
 : 2009/02/05 16:54:40 ended (elapsed time: 0 sec)
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/several-snapshot-...-tp21855239p21992862.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: abbreviation problem

2009-02-13 Thread Koji Sekiguchi
李学健 wrote:
 hi, all

 to abbreviation, for example, 'US', how can i get results containing
 'United States' in solr or lucene?
 in solr, synonyms filter, it seems only to handle one-word to one-word.
 but in abbreviation queries, words should be expanded.

   
SynonymFilter should support one word to phrase (two words or more),
phrase to one word and phrase to phrase. For example:

US, United States

or

US = United States

or

United States = US

Cheers,

Koji




Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Fergus McMenemie
Hello,

templatetransformer behaves rather ungracefully if one of the replacement
fields is missing.

I am parsing a single XML document into multiple separate solr documents.
It turns out that none of the source documents fields can be used to create
a uniqueKey alone. I need to combine two, using template transformer as
follows:

entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  rootEntity=true
  stream=false
  forEach=/record | /record/mediaBlock
  transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer
   

  field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
  field column=fileWebPath 
regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 
sourceColName=fileAbsolutePath/
  field column=id  
template=${jc.fileAbsolutePath}${x.vurl} /
  field column=vurl
xpath=/record/mediaBlock/mediaObject/@vurl /

The trouble is that vurl is only defined as a child of /record/mediaBlock
so my attempt to create id, the uniqueKey fails for the parent document 
/record

I am hacking around with TemplateTransformer.java to sort this but was
wondering if there was a good reason for this behavior.

Regards.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: spellcheck.onlyMorePopular

2009-02-13 Thread Shalin Shekhar Mangar
On Fri, Feb 13, 2009 at 2:51 PM, Marcus Stratmann stratm...@gmx.de wrote:

 Shalin Shekhar Mangar wrote:

 The end goal is to give spelling suggestions. Even if it gave less
 frequently occurring spelling suggestions, what would you do with it?

 To give you an example:
 We have an index for computer games. One title is gran turismo. The word
 gran is less frequent in the index than grand. So if someone searches
 for grand turismo there will be no suggestion gran.


Unless, I'm misunderstanding something, you need phrase suggestions and not
individual suggestions. I mean that you need suggestions for gran turismo
and not gran and turismo separately. Did you try using KeywordTokenizer
for this spell check field?



 And to come back to my last question: There seems to be no case in which
 onlyMorePopular=false makes sense (provided Grant's assumption is
 correct). Do you see one?


Here's a use-case -- you provide a mis-spelled word and you want the closest
suggestion by edit distance (frequency does not matter).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
the intent was to not to make a partial string if some of the variable
are missing

probably we can enhance TemplateTransformer by using an extra
attribute on the field

 field column=id template=${jc.fileAbsolutePath}${x.vurl}
ignoreMissingVariables=true/

then it can just resolve with whatever is available...



On Fri, Feb 13, 2009 at 3:17 PM, Fergus McMenemie fer...@twig.me.uk wrote:
 Hello,

 templatetransformer behaves rather ungracefully if one of the replacement
 fields is missing.

 I am parsing a single XML document into multiple separate solr documents.
 It turns out that none of the source documents fields can be used to create
 a uniqueKey alone. I need to combine two, using template transformer as
 follows:

 entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  rootEntity=true
  stream=false
  forEach=/record | /record/mediaBlock
  transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer
   

  field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
  field column=fileWebPath 
 regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 
 sourceColName=fileAbsolutePath/
  field column=id  
 template=${jc.fileAbsolutePath}${x.vurl} /
  field column=vurl
 xpath=/record/mediaBlock/mediaObject/@vurl /

 The trouble is that vurl is only defined as a child of /record/mediaBlock
 so my attempt to create id, the uniqueKey fails for the parent document 
 /record

 I am hacking around with TemplateTransformer.java to sort this but was
 wondering if there was a good reason for this behavior.

 Regards.
 --

 ===
 Fergus McMenemie   Email:fer...@twig.me.uk
 Techmore Ltd   Phone:(UK) 07721 376021

 Unix/Mac/Intranets Analyst Programmer
 ===




-- 
--Noble Paul


facet count on partial results

2009-02-13 Thread Karl Wettin

Hi Solr,

I pass a rather large amount of OR clauses to Solr, ending up with  
lots and lots of results. It's however only the results above a  
certain score threadshold that is interesting for me, thus I'd like to  
only get facet count of the results within the threadshold. How can I  
do that?




  karl 


Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann

Shalin Shekhar Mangar wrote:

And to come back to my last question: There seems to be no case in which
onlyMorePopular=false makes sense (provided Grant's assumption is
correct). Do you see one?


Here's a use-case -- you provide a mis-spelled word and you want the closest
suggestion by edit distance (frequency does not matter).


Hm, when I try searching for grand using onlyMorePopular=false I do 
not get any results. Same when trying gran. It seems that there will 
be no results at all when using onlyMorePopular=false. Without 
onlyMorePopular there are suggestions for both terms, so there are 
suggestions close enough to the original word(s). Have you tested your 
example case?


Anyway, if you look at it from the user's point of view: The wiki says 
spellcheck.onlyMorePopular -- Only return suggestions that result in 
more hits for the query than the existing query. This implies that if 
onlyMorePopular=false I will get even results with less hits. So when 
I'm checking grand I would expect to get the suggestion gran which 
is less frequent in the index. But it seems this is not the case.


But even if just the documentation is wrong or unclear:
1) I could not find a case in which onlyMorePopular=false works at all.
2) It would be nice if one could get suggestion with lower frequency 
than the checked word (which is, to me, what onlyMorePopular=false implies).


Thanks,
Marcus



Re: spellcheck.onlyMorePopular

2009-02-13 Thread Shalin Shekhar Mangar
On Fri, Feb 13, 2009 at 5:05 PM, Marcus Stratmann stratm...@gmx.de wrote:

 Hm, when I try searching for grand using onlyMorePopular=false I do not
 get any results. Same when trying gran. It seems that there will be no
 results at all when using onlyMorePopular=false.


When onlyMorePopular is false and the word you searched exists in the index,
it is returned as-is. Therefore if gran and grand are both present in
the index, they will be returned as is.


 Without onlyMorePopular there are suggestions for both terms, so there are
 suggestions close enough to the original word(s). Have you tested your
 example case?


I am confused by this. Did you mean With onlyMorePopular=true there are
suggestions for both terms?


 Anyway, if you look at it from the user's point of view: The wiki says
 spellcheck.onlyMorePopular -- Only return suggestions that result in more
 hits for the query than the existing query. This implies that if
 onlyMorePopular=false I will get even results with less hits. So when I'm
 checking grand I would expect to get the suggestion gran which is less
 frequent in the index. But it seems this is not the case.


If onlyMorePopular=true, then the algorithm finds tokens which have greater
frequency than the searched term. Among these terms, the one which is
closest (by edit distance) is returned.

I think I now understand the source of the confusion. onlyMorePopular=true
is a special behavior which uses *only* those tokens which have higher
frequency than the searched term. onlyMorePopular=false just switches off
this special behavior. It does *not* limit suggestions to tokens which have
lesser frequency than the searched term. In fact, onlyMorePopular=false does
not use frequency of tokens at all. We should document this clearly to avoid
such confusions in the future.


 2) It would be nice if one could get suggestion with lower frequency than
 the checked word (which is, to me, what onlyMorePopular=false implies).


We could enhance spell checker to do that. But can you please explain your
use-case for limiting suggestions to tokens which have lesser frequency? The
goal of spell checker is to give suggestions of wrongly spelled words. It
was neither designed nor intended to give any other sort of query
suggestions.

-- 
Regards,
Shalin Shekhar Mangar.


Trouble with solr IndexbasedSpellChecker and FilebasedSpellChecker

2009-02-13 Thread Kartik Desikan
Hi folks,

I'm using solr 1.3
Here is the relevant section from my solrconfig.xml

searchComponent name=spellcheck class=solr.SpellCheckComponent
!-- str name=queryAnalyzerFieldTypetextSpell/str --

lst name=spellchecker
str name=namedefault/str
!-- str name=classnamesolr.IndexBasedSpellChecker/str
--
!-- str name=fieldDESC/str --
!-- str name=spellcheckIndexDir./spellchecker/str --
str name=classnamesolr.FileBasedSpellChecker/str
str name=sourceLocation/tmp/dct.txt/str
str name=spellcheckIndexDir./filespellchecker/str
str name=accuracy0.7/str
/lst
/searchComponent

Neither the indexbasedspellchecker, nor the filebasedchecker works for me.
What happens is that after I add the spellcheck component in the solrconfig
and subsequently restart my resin(3.16) server, my data index gets removed.

Before:
47Mwebapps/index/solr/data/index
47Mwebapps/index/solr/data/
After:
12Kwebapps/index/solr/data/index
12Kwebapps/index/solr/data/filespellchecker
28Kwebapps/index/solr/data/

Can anyone clue me into why this is happening?

Thanks a lot!
Kartik


Get # of docs pending commit

2009-02-13 Thread Jacob Singh
Hi,

Is there a way to retrieve the # of documents which are pending commit
(when using autocommit)?

Thanks,
Jacob

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Fergus McMenemie
Hello,

templatetransformer behaves rather ungracefully if one of the replacement
fields is missing.

Looking at TemplateString.java I see that left to itself fillTokens would 
replace a missing variable with . It is an extra check in TemplateTransformer
that is throwing the warning and stopping the row being returned. Commenting
out the check seems to solve my problem.

Having done this, an undefined replacement string in TemplateTransformer
is replaced with . However a neater fix would probably involve making 
use of the default value which can be assigned to a row? in schema.xml. 

I am parsing a single XML document into multiple separate solr documents.
It turns out that none of the source documents fields can be used to create
a uniqueKey alone. I need to combine two, using template transformer as
follows:

entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  rootEntity=true
  stream=false
  forEach=/record | /record/mediaBlock
  transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer
   

  field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
  field column=fileWebPath 
 regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 
 sourceColName=fileAbsolutePath/
  field column=id  
 template=${jc.fileAbsolutePath}${x.vurl} /
  field column=vurl
 xpath=/record/mediaBlock/mediaObject/@vurl /

The trouble is that vurl is only defined as a child of /record/mediaBlock
so my attempt to create id, the uniqueKey fails for the parent document 
/record

I am hacking around with TemplateTransformer.java to sort this but was
wondering if there was a good reason for this behavior.


-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Get # of docs pending commit

2009-02-13 Thread Koji Sekiguchi

Jacob,

Regardless of you are using autocommit or manul commit,
look at Admin  statistics  Update Handlers  status  docsPending.

Koji

Jacob Singh wrote:

Hi,

Is there a way to retrieve the # of documents which are pending commit
(when using autocommit)?

Thanks,
Jacob

  




delete snapshot??

2009-02-13 Thread sunnyfr

root 26834 16.2  0.0  19412   824 ?S16:05   0:08 rsync -Wa
--delete rsync://##.##.##.##:18180/solr/snapshot.20090213160051/
/data/solr/books/data/snapshot.20090213160051-wip

Hi obviously it can't delete them because the adress is bad it shouldnt be :
rsync://##.##.##.##:18180/solr/snapshot.20090213160051/
but:
rsync://##.##.##.##:18180/solr/books/snapshot.20090213160051/

Where should I change this, I checked my script.conf on the slave server but
it seems good.

Because files can be very big and my server in few hours is getting full.

So actually snapcleaner is not necessary on the master ? what about the
slave?

Thanks a lot,
Sunny
-- 
View this message in context: 
http://www.nabble.com/delete-snapshot---tp21998333p21998333.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann

Shalin Shekhar Mangar wrote:

If onlyMorePopular=true, then the algorithm finds tokens which have greater
frequency than the searched term. Among these terms, the one which is
closest (by edit distance) is returned.


Okay, this is a bit weird, but I think I got it now. Let me try to 
explain it using my example. When I search for gran (frequency 10) I 
get the suggestion grand (frequency 17) when using 
onlyMorePopular=true. When I use onlyMorePopular=false there are no 
suggestions at all. This is because there are some (rare) terms which 
are  closer to gran than grand, but all of them are not considered, 
because there frequency is below 10. Is that correct?
But then, why isn't grand promoted to first place and returned as a 
valid suggestion?




I think I now understand the source of the confusion. onlyMorePopular=true
is a special behavior which uses *only* those tokens which have higher
frequency than the searched term. onlyMorePopular=false just switches off
this special behavior. It does *not* limit suggestions to tokens which have
lesser frequency than the searched term. In fact, onlyMorePopular=false does
not use frequency of tokens at all. We should document this clearly to avoid
such confusions in the future.


I'm still missing the two parameters accuracy and spellcheck.count. Let 
me try to explain how I (now) think the algorithm works:


1) Take all terms from the index as a basic set.
2) If onlyMorePopular=true remove all terms from the basic set which 
have a frequency below the frequency of the search term.
3) Sort the basic set in respect of distance to the search term and keep 
the spellcheck.count terms whith the smallest distance and which are 
within accuracy.
4) Remove of terms which have a lower frequency than the search term in 
the case onlyMorePopular=false.

5) Return the remaining terms as suggestions.

Point 3 would explain why I do not get any suggestions for gran having
onlyMorePopular=false. Nevertheless I think this is a bug since point 3 
should take into account the frequency as well and promote suggestions 
with high enough frequency if suggestion with low frequency are deleted.


But this is just my assumption on how the algorithm works which explains 
why there are no suggestions using onlyMorePopular=false. Maybe I am 
wrong, but somewhere in the process grand is deleted from the result set.




2) It would be nice if one could get suggestion with lower frequency than
the checked word (which is, to me, what onlyMorePopular=false implies).


We could enhance spell checker to do that. But can you please explain your
use-case for limiting suggestions to tokens which have lesser frequency? The
goal of spell checker is to give suggestions of wrongly spelled words. It
was neither designed nor intended to give any other sort of query
suggestions.


An example would be the mentioned grand turismo (regard that in the 
example above I was searching for gran whereas now I am searching for 
grand). gran would not be returned as a suggestion because grand 
is more frequent in the index. And yes, I know, returning a suggestion 
in this case will be only useful if there is more than one word in the 
search term. You proposed to use KeywordTokenizer for this case but a) I 
(again) was not able to find any documentation for this and b) we are 
working on a different solution for this case using stored search 
queries. If you are interested, it works like this: For every word in 
the query get some spell checking suggestions. Combine these and find 
out if any of these combinations has been search for (successfully) 
before. Propose the one with the highest (search) frequency. Looks 
promising so far, but the gran turismo example won't work, since there 
are too many grands in the index.


Thanks,
Marcus


Re: Get # of docs pending commit

2009-02-13 Thread Jacob Singh
Hi Koji,

Thanks, but I'm trying to get it via a web service, not via the admin interface.

Best,
Jacob

On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 Jacob,

 Regardless of you are using autocommit or manul commit,
 look at Admin  statistics  Update Handlers  status  docsPending.

 Koji

 Jacob Singh wrote:

 Hi,

 Is there a way to retrieve the # of documents which are pending commit
 (when using autocommit)?

 Thanks,
 Jacob







-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Get # of docs pending commit

2009-02-13 Thread Shalin Shekhar Mangar
Jacob, the output of stats.jsp is an XML which you can consume in your
program. It is transformed to html using XSL.

On Fri, Feb 13, 2009 at 9:09 PM, Jacob Singh jacobsi...@gmail.com wrote:

 Hi Koji,

 Thanks, but I'm trying to get it via a web service, not via the admin
 interface.

 Best,
 Jacob

 On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp
 wrote:
  Jacob,
 
  Regardless of you are using autocommit or manul commit,
  look at Admin  statistics  Update Handlers  status  docsPending.
 
  Koji
 
  Jacob Singh wrote:
 
  Hi,
 
  Is there a way to retrieve the # of documents which are pending commit
  (when using autocommit)?
 
  Thanks,
  Jacob
 
 
 
 



 --

 +1 510 277-0891 (o)
 +91  33 7458 (m)

 web: http://pajamadesign.com

 Skype: pajamadesign
 Yahoo: jacobsingh
 AIM: jacobsingh
 gTalk: jacobsi...@gmail.com




-- 
Regards,
Shalin Shekhar Mangar.


Re: Get # of docs pending commit

2009-02-13 Thread Jacob Singh
*Jacob Singh feels dumb*

Thanks!

On Fri, Feb 13, 2009 at 9:14 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 Jacob, the output of stats.jsp is an XML which you can consume in your
 program. It is transformed to html using XSL.

 On Fri, Feb 13, 2009 at 9:09 PM, Jacob Singh jacobsi...@gmail.com wrote:

 Hi Koji,

 Thanks, but I'm trying to get it via a web service, not via the admin
 interface.

 Best,
 Jacob

 On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp
 wrote:
  Jacob,
 
  Regardless of you are using autocommit or manul commit,
  look at Admin  statistics  Update Handlers  status  docsPending.
 
  Koji
 
  Jacob Singh wrote:
 
  Hi,
 
  Is there a way to retrieve the # of documents which are pending commit
  (when using autocommit)?
 
  Thanks,
  Jacob
 
 
 
 



 --

 +1 510 277-0891 (o)
 +91  33 7458 (m)

 web: http://pajamadesign.com

 Skype: pajamadesign
 Yahoo: jacobsingh
 AIM: jacobsingh
 gTalk: jacobsi...@gmail.com




 --
 Regards,
 Shalin Shekhar Mangar.




-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: spellcheck.onlyMorePopular

2009-02-13 Thread Walter Underwood
Fuzzy search should match grand turismo to gran turismo without
using spelling suggestions. At Netflix, the first hit for the
query grand turismo is the movie Gran Torino and we use fuzzy
with Solr.

wunder

On 2/13/09 3:35 AM, Marcus Stratmann stratm...@gmx.de wrote:

 Shalin Shekhar Mangar wrote:
 And to come back to my last question: There seems to be no case in which
 onlyMorePopular=false makes sense (provided Grant's assumption is
 correct). Do you see one?
 
 Here's a use-case -- you provide a mis-spelled word and you want the closest
 suggestion by edit distance (frequency does not matter).
 
 Hm, when I try searching for grand using onlyMorePopular=false I do
 not get any results. Same when trying gran. It seems that there will
 be no results at all when using onlyMorePopular=false. Without
 onlyMorePopular there are suggestions for both terms, so there are
 suggestions close enough to the original word(s). Have you tested your
 example case?
 
 Anyway, if you look at it from the user's point of view: The wiki says
 spellcheck.onlyMorePopular -- Only return suggestions that result in
 more hits for the query than the existing query. This implies that if
 onlyMorePopular=false I will get even results with less hits. So when
 I'm checking grand I would expect to get the suggestion gran which
 is less frequent in the index. But it seems this is not the case.
 
 But even if just the documentation is wrong or unclear:
 1) I could not find a case in which onlyMorePopular=false works at all.
 2) It would be nice if one could get suggestion with lower frequency
 than the checked word (which is, to me, what onlyMorePopular=false implies).
 
 Thanks,
 Marcus
 



Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Fergus McMenemie
Paul,

Following up your usenet sussgetion:

 field column=id template=${jc.fileAbsolutePath}${x.vurl}
ignoreMissingVariables=true/

and to add more to what I was thinking...

if the field is undefined in the input document, but the schema.xml
does allow a default value, then TemplateTransformer can use the 
default value. If there is no default value defined in schema.xml 
then it can fail as at present. This would allow  or any other
value to be fed into TemplateTransformer, and still enable avoidance
of the partial strings you referred to.

Regards Fergus.

Hello,

templatetransformer behaves rather ungracefully if one of the replacement
fields is missing.

Looking at TemplateString.java I see that left to itself fillTokens would 
replace a missing variable with . It is an extra check in TemplateTransformer
that is throwing the warning and stopping the row being returned. Commenting
out the check seems to solve my problem.

Having done this, an undefined replacement string in TemplateTransformer
is replaced with . However a neater fix would probably involve making 
use of the default value which can be assigned to a row? in schema.xml. 

I am parsing a single XML document into multiple separate solr documents.
It turns out that none of the source documents fields can be used to create
a uniqueKey alone. I need to combine two, using template transformer as
follows:

entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  rootEntity=true
  stream=false
  forEach=/record | /record/mediaBlock
  transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer
   

  field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
  field column=fileWebPath 
 regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 
 sourceColName=fileAbsolutePath/
  field column=id  
 template=${jc.fileAbsolutePath}${x.vurl} /
  field column=vurl
 xpath=/record/mediaBlock/mediaObject/@vurl /

The trouble is that vurl is only defined as a child of /record/mediaBlock
so my attempt to create id, the uniqueKey fails for the parent document 
/record

I am hacking around with TemplateTransformer.java to sort this but was
wondering if there was a good reason for this behavior.


-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: spellcheck.onlyMorePopular

2009-02-13 Thread Shalin Shekhar Mangar
On Fri, Feb 13, 2009 at 8:46 PM, Marcus Stratmann stratm...@gmx.de wrote:


 Okay, this is a bit weird, but I think I got it now. Let me try to explain
 it using my example. When I search for gran (frequency 10) I get the
 suggestion grand (frequency 17) when using onlyMorePopular=true. When I
 use onlyMorePopular=false there are no suggestions at all. This is because
 there are some (rare) terms which are  closer to gran than grand, but
 all of them are not considered, because there frequency is below 10. Is that
 correct?


No. Think of onlyMorePopular as a toggle between whether to consider
frequency or not. When you say onlyMorePopular=true, higher frequency terms
are considered. When you say onlyMorePopular=false, frequency plays no role
at all and gran is returned because according to the spell checker, it
exists in the index and is therefore a correctly spelled term.


 I'm still missing the two parameters accuracy and spellcheck.count. Let me
 try to explain how I (now) think the algorithm works:

 1) Take all terms from the index as a basic set.
 2) If onlyMorePopular=true remove all terms from the basic set which have a
 frequency below the frequency of the search term.
 3) Sort the basic set in respect of distance to the search term and keep
 the spellcheck.count terms whith the smallest distance and which are
 within accuracy.
 4) Remove of terms which have a lower frequency than the search term in the
 case onlyMorePopular=false.
 5) Return the remaining terms as suggestions.

 Point 3 would explain why I do not get any suggestions for gran having
 onlyMorePopular=false. Nevertheless I think this is a bug since point 3
 should take into account the frequency as well and promote suggestions with
 high enough frequency if suggestion with low frequency are deleted.

 But this is just my assumption on how the algorithm works which explains
 why there are no suggestions using onlyMorePopular=false. Maybe I am wrong,
 but somewhere in the process grand is deleted from the result set.


Point #4 is incorrect. As I said earlier, when onlyMorePopular=false,
frequency information is not used and there is no filtering of tokens with
respect to frequency.

The implementation is a bit more complicated.

1. Read all tokens from the specified field in the solr index.
2. Create n-grams of the terms read in #1 and index them into a separate
Lucene index (spellcheck index).
3. When asked for suggestions, create n-grams of the query terms, search the
spellcheck index and collects the top (by lucene score) 10*spellcheck.count
results.
4. If onlyMorePopular=true, determine frequency of each result in the solr
index and remove terms which have lesser frequency.
5. Compute the edit distance between the result and the query token.
6. Return the top spellcheck.count results (sorted by edit distance
descending) which are greater than specified accuracy.


 An example would be the mentioned grand turismo (regard that in the
 example above I was searching for gran whereas now I am searching for
 grand). gran would not be returned as a suggestion because grand is
 more frequent in the index. And yes, I know, returning a suggestion in this
 case will be only useful if there is more than one word in the search term.
 You proposed to use KeywordTokenizer for this case but a) I (again) was not
 able to find any documentation for this and b) we are working on a different
 solution for this case using stored search queries. If you are interested,
 it works like this: For every word in the query get some spell checking
 suggestions. Combine these and find out if any of these combinations has
 been search for (successfully) before. Propose the one with the highest
 (search) frequency. Looks promising so far, but the gran turismo example
 won't work, since there are too many grands in the index.


Your primary use-case is not spellcheck at all but this might work with some
hacking. Fuzzy queries may be a better solution as Walter said. Storing, all
successful search queries may be hard to scale.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Get # of docs pending commit

2009-02-13 Thread Erik Hatcher
Jacob - note that the results from stats.jsp come back in XML format -  
which could be used programmatically from a client.  Unfortunately the  
JSP pages don't follow the wt (writer type) parameter that standard  
request handlers use, but at least it's structured data and not HTML  
to be scraped.


Erik

On Feb 13, 2009, at 6:50 AM, Koji Sekiguchi wrote:


Jacob,

Regardless of you are using autocommit or manul commit,
look at Admin  statistics  Update Handlers  status  docsPending.

Koji

Jacob Singh wrote:

Hi,

Is there a way to retrieve the # of documents which are pending  
commit

(when using autocommit)?

Thanks,
Jacob






Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Erik Hatcher
What about having the template transformer support ${field:default}  
syntax?  I'm assuming it doesn't support that currently right?  The  
replace stuff in the config files does though.


Erik


On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote:


Paul,

Following up your usenet sussgetion:

field column=id template=${jc.fileAbsolutePath}${x.vurl}
ignoreMissingVariables=true/

and to add more to what I was thinking...

if the field is undefined in the input document, but the schema.xml
does allow a default value, then TemplateTransformer can use the
default value. If there is no default value defined in schema.xml
then it can fail as at present. This would allow  or any other
value to be fed into TemplateTransformer, and still enable avoidance
of the partial strings you referred to.

Regards Fergus.


Hello,

templatetransformer behaves rather ungracefully if one of the  
replacement

fields is missing.


Looking at TemplateString.java I see that left to itself fillTokens  
would
replace a missing variable with . It is an extra check in  
TemplateTransformer
that is throwing the warning and stopping the row being returned.  
Commenting

out the check seems to solve my problem.

Having done this, an undefined replacement string in  
TemplateTransformer
is replaced with . However a neater fix would probably involve  
making
use of the default value which can be assigned to a row? in  
schema.xml.


I am parsing a single XML document into multiple separate solr  
documents.
It turns out that none of the source documents fields can be used  
to create
a uniqueKey alone. I need to combine two, using template  
transformer as

follows:

entity name=x
dataSource=myfilereader
processor=XPathEntityProcessor
url=${jc.fileAbsolutePath}
rootEntity=true
stream=false
forEach=/record | /record/mediaBlock
transformer 
=DateFormatTransformer,TemplateTransformer,RegexTransformer




field column=fileAbsolutePathtemplate=$ 
{jc.fileAbsolutePath} /
field column=fileWebPath regex=$ 
{dataimporter.request.installdir}(.*) replaceWith=/ford$1  
sourceColName=fileAbsolutePath/
field column=id  template=$ 
{jc.fileAbsolutePath}${x.vurl} /
field column=vurlxpath=/record/mediaBlock/ 
mediaObject/@vurl /


The trouble is that vurl is only defined as a child of /record/ 
mediaBlock
so my attempt to create id, the uniqueKey fails for the parent  
document /record


I am hacking around with TemplateTransformer.java to sort this  
but was

wondering if there was a good reason for this behavior.



--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===




Wildcard query case problem

2009-02-13 Thread Alexander Ramos Jardim
Hey guys,

I getting problems making wildcard query in the form nameSort:Arlin*. If I
do such a query, I get 0 results, but when I do nameSort:arlin* I get 310
results from my index. Are wildcard queries case sensitive?

This is the searched field config.

fieldType name=string_lc class=solr.TextField
analyzer
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
filter class=solr.ISOLatin1AccentFilterFactory /
/analyzer
/fieldType

-- 
Alexander Ramos Jardim


Re: Wildcard query case problem

2009-02-13 Thread Marc Sturlese

Are you using the same analyzer to queue and index?

zayhen wrote:
 
 Hey guys,
 
 I getting problems making wildcard query in the form nameSort:Arlin*. If
 I
 do such a query, I get 0 results, but when I do nameSort:arlin* I get
 310
 results from my index. Are wildcard queries case sensitive?
 
 This is the searched field config.
 
 fieldType name=string_lc class=solr.TextField
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.ISOLatin1AccentFilterFactory /
 /analyzer
 /fieldType
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 

-- 
View this message in context: 
http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcard query case problem

2009-02-13 Thread Erick Erickson
From a post in the archives:


Wildcard searches are case-sensitive in Solr. I faced the same issue and
handled converting the query string to lower case in my code itself. The
filters and analyzers are not applicable for wildcard queries.

The searchable mail archive is wonderful G.


Best
Erick

On Fri, Feb 13, 2009 at 12:36 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Are you using the same analyzer to queue and index?

 zayhen wrote:
 
  Hey guys,
 
  I getting problems making wildcard query in the form nameSort:Arlin*.
 If
  I
  do such a query, I get 0 results, but when I do nameSort:arlin* I get
  310
  results from my index. Are wildcard queries case sensitive?
 
  This is the searched field config.
 
  fieldType name=string_lc class=solr.TextField
  analyzer
  tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.TrimFilterFactory /
  filter class=solr.ISOLatin1AccentFilterFactory /
  /analyzer
  /fieldType
 
  --
  Alexander Ramos Jardim
 
 
  -
  RPG da Ilha
 

 --
 View this message in context:
 http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Fergus McMenemie

Hmmm. Just gave that a go! No luck
But how many layers of defaults do we need?


Rgds Fergus

What about having the template transformer support ${field:default}  
syntax?  I'm assuming it doesn't support that currently right?  The  
replace stuff in the config files does though.

   Erik


On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote:

 Paul,

 Following up your usenet sussgetion:

 field column=id template=${jc.fileAbsolutePath}${x.vurl}
 ignoreMissingVariables=true/

 and to add more to what I was thinking...

 if the field is undefined in the input document, but the schema.xml
 does allow a default value, then TemplateTransformer can use the
 default value. If there is no default value defined in schema.xml
 then it can fail as at present. This would allow  or any other
 value to be fed into TemplateTransformer, and still enable avoidance
 of the partial strings you referred to.

 Regards Fergus.

 Hello,

 templatetransformer behaves rather ungracefully if one of the  
 replacement
 fields is missing.

 Looking at TemplateString.java I see that left to itself fillTokens  
 would
 replace a missing variable with . It is an extra check in  
 TemplateTransformer
 that is throwing the warning and stopping the row being returned.  
 Commenting
 out the check seems to solve my problem.

 Having done this, an undefined replacement string in  
 TemplateTransformer
 is replaced with . However a neater fix would probably involve  
 making
 use of the default value which can be assigned to a row? in  
 schema.xml.

 I am parsing a single XML document into multiple separate solr  
 documents.
 It turns out that none of the source documents fields can be used  
 to create
 a uniqueKey alone. I need to combine two, using template  
 transformer as
 follows:

 entity name=x
 dataSource=myfilereader
 processor=XPathEntityProcessor
 url=${jc.fileAbsolutePath}
 rootEntity=true
 stream=false
 forEach=/record | /record/mediaBlock
 transformer 
 =DateFormatTransformer,TemplateTransformer,RegexTransformer


 field column=fileAbsolutePathtemplate=$ 
 {jc.fileAbsolutePath} /
 field column=fileWebPath regex=$ 
 {dataimporter.request.installdir}(.*) replaceWith=/ford$1  
 sourceColName=fileAbsolutePath/
 field column=id  template=$ 
 {jc.fileAbsolutePath}${x.vurl} /
 field column=vurlxpath=/record/mediaBlock/ 
 mediaObject/@vurl /

 The trouble is that vurl is only defined as a child of /record/ 
 mediaBlock
 so my attempt to create id, the uniqueKey fails for the parent  
 document /record

 I am hacking around with TemplateTransformer.java to sort this  
 but was
 wondering if there was a good reason for this behavior.


 -- 

 ===
 Fergus McMenemie   Email:fer...@twig.me.uk
 Techmore Ltd   Phone:(UK) 07721 376021

 Unix/Mac/Intranets Analyst Programmer
 ===

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: delete snapshot??

2009-02-13 Thread Bill Au
The --delete option of the rsync command deletes extraneous files from the
destination directory.  It does not delete Solr snapshots.  To do that you
can use the snapcleaner on the master and/or slave.

Bill

On Fri, Feb 13, 2009 at 10:15 AM, sunnyfr johanna...@gmail.com wrote:


 root 26834 16.2  0.0  19412   824 ?S16:05   0:08 rsync -Wa
 --delete rsync://##.##.##.##:18180/solr/snapshot.20090213160051/
 /data/solr/books/data/snapshot.20090213160051-wip

 Hi obviously it can't delete them because the adress is bad it shouldnt be
 :
 rsync://##.##.##.##:18180/solr/snapshot.20090213160051/
 but:
 rsync://##.##.##.##:18180/solr/books/snapshot.20090213160051/

 Where should I change this, I checked my script.conf on the slave server
 but
 it seems good.

 Because files can be very big and my server in few hours is getting full.

 So actually snapcleaner is not necessary on the master ? what about the
 slave?

 Thanks a lot,
 Sunny
 --
 View this message in context:
 http://www.nabble.com/delete-snapshot---tp21998333p21998333.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Wildcard query case problem

2009-02-13 Thread Alexander Ramos Jardim
Thanks for pointing this out to me Erick.

2009/2/13 Erick Erickson erickerick...@gmail.com

 From a post in the archives:


 Wildcard searches are case-sensitive in Solr. I faced the same issue and
 handled converting the query string to lower case in my code itself. The
 filters and analyzers are not applicable for wildcard queries.

 The searchable mail archive is wonderful G.



 Best
 Erick

 On Fri, Feb 13, 2009 at 12:36 PM, Marc Sturlese marc.sturl...@gmail.com
 wrote:

 
  Are you using the same analyzer to queue and index?
 
  zayhen wrote:
  
   Hey guys,
  
   I getting problems making wildcard query in the form nameSort:Arlin*.
  If
   I
   do such a query, I get 0 results, but when I do nameSort:arlin* I get
   310
   results from my index. Are wildcard queries case sensitive?
  
   This is the searched field config.
  
   fieldType name=string_lc class=solr.TextField
   analyzer
   tokenizer class=solr.KeywordTokenizerFactory /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.TrimFilterFactory /
   filter class=solr.ISOLatin1AccentFilterFactory /
   /analyzer
   /fieldType
  
   --
   Alexander Ramos Jardim
  
  
   -
   RPG da Ilha
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 




-- 
Alexander Ramos Jardim


Re: SolrJ API and XMLResponseParser

2009-02-13 Thread Amit Nithian
Hi Noble,
According to the wiki, the following should work:
server.setParser(new XMLResponseParser());

However, I don't see that method. The only place I see that method even
being declared is in the SolrRequest class but then wiring that up with the
SolrServer and getting results wasn't overly obvious to me.

Thanks
Amit

On Fri, Feb 13, 2009 at 12:31 AM, Noble Paul നോബിള്‍ नोब्ळ् 
noble.p...@gmail.com wrote:

 On Fri, Feb 13, 2009 at 1:16 PM, Amit Nithian anith...@gmail.com wrote:
  I am using SolrJ from trunk and according to
  http://wiki.apache.org/solr/Solrj you should be able to set the response
  parser in the SolrServer interface layer; however, I am unable to do so
 and
  I need the XML response support for querying and adding documents to a
 Solr
  1.2 instance. Also, I have changed the XML response for a particular
 query
  handler and hence may need to alter the response parser to accommodate
 these
  changes.
 setting a response parser should work , how do you know it does not work?
  I know that the API is experimental but has trunk's version of the SolrJ
 API
  changed with respect to SolrJ wiki?
 No. the trunk should still work w/ Solr 1.2
 
  Thanks
  Amit
 



 --
 --Noble Paul



Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Feb 13, 2009 at 10:17 AM, Fergus McMenemie fer...@twig.me.uk wrote:
 Paul,

 Following up your usenet sussgetion:

  field column=id template=${jc.fileAbsolutePath}${x.vurl}
 ignoreMissingVariables=true/

 and to add more to what I was thinking...

 if the field is undefined in the input document, but the schema.xml
 does allow a default value, then TemplateTransformer can use the
 default value. If there is no default value defined in schema.xml
it is not really useful. Solr would automatically fill up with default values.
 then it can fail as at present. This would allow  or any other
 value to be fed into TemplateTransformer, and still enable avoidance
 of the partial strings you referred to.

 Regards Fergus.

Hello,

templatetransformer behaves rather ungracefully if one of the replacement
fields is missing.

Looking at TemplateString.java I see that left to itself fillTokens would
replace a missing variable with . It is an extra check in 
TemplateTransformer
that is throwing the warning and stopping the row being returned. Commenting
out the check seems to solve my problem.

Having done this, an undefined replacement string in TemplateTransformer
is replaced with . However a neater fix would probably involve making
use of the default value which can be assigned to a row? in schema.xml.

I am parsing a single XML document into multiple separate solr documents.
It turns out that none of the source documents fields can be used to create
a uniqueKey alone. I need to combine two, using template transformer as
follows:

entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  rootEntity=true
  stream=false
  forEach=/record | /record/mediaBlock
  transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer
   

  field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
  field column=fileWebPath 
 regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 
 sourceColName=fileAbsolutePath/
  field column=id  
 template=${jc.fileAbsolutePath}${x.vurl} /
  field column=vurl
 xpath=/record/mediaBlock/mediaObject/@vurl /

The trouble is that vurl is only defined as a child of /record/mediaBlock
so my attempt to create id, the uniqueKey fails for the parent document 
/record

I am hacking around with TemplateTransformer.java to sort this but was
wondering if there was a good reason for this behavior.


 --

 ===
 Fergus McMenemie   Email:fer...@twig.me.uk
 Techmore Ltd   Phone:(UK) 07721 376021

 Unix/Mac/Intranets Analyst Programmer
 ===




-- 
--Noble Paul


Re: Problem using DIH templatetransformer to create uniqueKey

2009-02-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Feb 13, 2009 at 11:04 AM, Erik Hatcher
e...@ehatchersolutions.com wrote:
 What about having the template transformer support ${field:default} syntax?

this is the only usecase for this. this can be easily achieved with a
custom Transformer
  I'm assuming it doesn't support that currently right?  The replace stuff in
 the config files does though.

Erik


 On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote:

 Paul,

 Following up your usenet sussgetion:

 field column=id template=${jc.fileAbsolutePath}${x.vurl}
 ignoreMissingVariables=true/

 and to add more to what I was thinking...

 if the field is undefined in the input document, but the schema.xml
 does allow a default value, then TemplateTransformer can use the
 default value. If there is no default value defined in schema.xml
 then it can fail as at present. This would allow  or any other
 value to be fed into TemplateTransformer, and still enable avoidance
 of the partial strings you referred to.

 Regards Fergus.

 Hello,

 templatetransformer behaves rather ungracefully if one of the
 replacement
 fields is missing.

 Looking at TemplateString.java I see that left to itself fillTokens would
 replace a missing variable with . It is an extra check in
 TemplateTransformer
 that is throwing the warning and stopping the row being returned.
 Commenting
 out the check seems to solve my problem.

 Having done this, an undefined replacement string in TemplateTransformer
 is replaced with . However a neater fix would probably involve making
 use of the default value which can be assigned to a row? in schema.xml.

 I am parsing a single XML document into multiple separate solr
 documents.
 It turns out that none of the source documents fields can be used to
 create
 a uniqueKey alone. I need to combine two, using template transformer as
 follows:

 entity name=x
 dataSource=myfilereader
 processor=XPathEntityProcessor
 url=${jc.fileAbsolutePath}
 rootEntity=true
 stream=false
 forEach=/record | /record/mediaBlock
 transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer


 field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} /
 field column=fileWebPath
 regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1
 sourceColName=fileAbsolutePath/
 field column=id
  template=${jc.fileAbsolutePath}${x.vurl} /
 field column=vurl
  xpath=/record/mediaBlock/mediaObject/@vurl /

 The trouble is that vurl is only defined as a child of
 /record/mediaBlock
 so my attempt to create id, the uniqueKey fails for the parent document
 /record

 I am hacking around with TemplateTransformer.java to sort this but was
 wondering if there was a good reason for this behavior.


 --

 ===
 Fergus McMenemie   Email:fer...@twig.me.uk
 Techmore Ltd   Phone:(UK) 07721 376021

 Unix/Mac/Intranets Analyst Programmer
 ===





-- 
--Noble Paul


Re: SolrJ API and XMLResponseParser

2009-02-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Feb 13, 2009 at 9:18 PM, Amit Nithian anith...@gmail.com wrote:
 Hi Noble,
 According to the wiki, the following should work:
 server.setParser(new XMLResponseParser());
I guess it may be a typo. pls referto the javadocs for CommonsHttpSolrServer

 However, I don't see that method. The only place I see that method even
 being declared is in the SolrRequest class but then wiring that up with the
 SolrServer and getting results wasn't overly obvious to me.

if a parser is set at the Request level ,that takes precedence.

 Thanks
 Amit

 On Fri, Feb 13, 2009 at 12:31 AM, Noble Paul നോബിള്‍ नोब्ळ् 
 noble.p...@gmail.com wrote:

 On Fri, Feb 13, 2009 at 1:16 PM, Amit Nithian anith...@gmail.com wrote:
  I am using SolrJ from trunk and according to
  http://wiki.apache.org/solr/Solrj you should be able to set the response
  parser in the SolrServer interface layer; however, I am unable to do so
 and
  I need the XML response support for querying and adding documents to a
 Solr
  1.2 instance. Also, I have changed the XML response for a particular
 query
  handler and hence may need to alter the response parser to accommodate
 these
  changes.
 setting a response parser should work , how do you know it does not work?
  I know that the API is experimental but has trunk's version of the SolrJ
 API
  changed with respect to SolrJ wiki?
 No. the trunk should still work w/ Solr 1.2
 
  Thanks
  Amit
 



 --
 --Noble Paul





-- 
--Noble Paul