Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex
you need to update the solrj libs to 3.x version. the java bin format has changed . I made the change a few months back, you can pull the changes from https://github.com/geek4377/nutch/tree/geek5377-1.2.1 hope that helps, On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions llsub...@zudiewiener.com wrote: I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have added the nutch schema and can access/view this schema via the admin page. nutch also works as I can perfrom successful searches. When I execute the following: ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb crawl/linkdb crawl/segments/* I (eventually) get an io error. Tha above command creates the following files /var/lib/tomcat6/solr/core0/data/index/ --- 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 _1.fdx 4 -rw-r--r-- 1 tomcat6 tomcat6 32 2011-07-13 10:59 segments_2 4 -rw-r--r-- 1 tomcat6 tomcat6 20 2011-07-13 10:59 segments.gen 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 write.lock --- but the hadoop.log reports the following error --- 2011-07-13 11:09:47,665 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-07-13 11:09:47,666 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: content dest: content 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: site dest: site 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: title dest: title 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: host dest: host 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: segment dest: segment 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: boost dest: boost 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: digest dest: digest 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: id 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: url 2011-07-13 11:09:49,272 WARN mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:54) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask $3.collect(ReduceTask.java:440) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner $Job.run(LocalJobRunner.java:216) 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException: Job failed! --- I'd appreciate any help with this. Thanks, Leo
Re: Can I still search documents once updated?
It indeed is not stored, but this is still unexpected behavior. It's a stored and indexed field, why has the index data been lost? On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson erickerick...@gmail.comwrote: Unless you stored your content field, the value you put in there won't be fetched from the index. Verify that the doc you retrieve from the index has values for content, I bet it doesn't Best Erick On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdateLoseTermsSimplified() throws Exception { *IndexWriter writer = indexDoc();* assertEquals(1, writer.numDocs()); IndexSearcher searcher = getSearcher(writer); final TermQuery termQuery = new TermQuery(new Term(content, essen)); TopDocs docs = searcher.search(termQuery, 1); assertEquals(1, docs.totalHits); Document doc = searcher.doc(0); *writer.updateDocument(new Term(id,doc.get(id)),doc);* searcher = getSearcher(writer); *docs = searcher.search(termQuery, 1);* *assertEquals(1, docs.totalHits);*//docs.totalHits == 0 ! } testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest) Time elapsed: 0.346 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271) I have not changed anything (as you can see) during the update. I just retrieve a document and the update it. But then the termQuery that worked before doesn't work anymore (while the id field wasn't changed). Is this to be expected when content field is not stored? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: ' invisible ' words
Hi Denis, The order of the filter during index time and query time are different e.g. the synonyms filter. Do you have a custom synonyms text file which may be causing the issues ? It usually works fine if you have the same filter order during Index and Query time. You can try out. Regards, Jayendra On Tue, Jul 12, 2011 at 11:19 PM, deniz denizdurmu...@gmail.com wrote: nothing was changed... the result is still the same... shuold i implement my own analyzer or tokenizer for the problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3164670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex
Works like a charm. Thanks, Leo On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote: you need to update the solrj libs to 3.x version. the java bin format has changed . I made the change a few months back, you can pull the changes from https://github.com/geek4377/nutch/tree/geek5377-1.2.1 hope that helps, On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions llsub...@zudiewiener.com wrote: I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have added the nutch schema and can access/view this schema via the admin page. nutch also works as I can perfrom successful searches. When I execute the following: ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb crawl/linkdb crawl/segments/* I (eventually) get an io error. Tha above command creates the following files /var/lib/tomcat6/solr/core0/data/index/ --- 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 _1.fdx 4 -rw-r--r-- 1 tomcat6 tomcat6 32 2011-07-13 10:59 segments_2 4 -rw-r--r-- 1 tomcat6 tomcat6 20 2011-07-13 10:59 segments.gen 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 write.lock --- but the hadoop.log reports the following error --- 2011-07-13 11:09:47,665 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-07-13 11:09:47,666 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: content dest: content 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: site dest: site 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: title dest: title 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: host dest: host 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: segment dest: segment 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: boost dest: boost 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: digest dest: digest 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: id 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: url 2011-07-13 11:09:49,272 WARN mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:54) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask $3.collect(ReduceTask.java:440) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner $Job.run(LocalJobRunner.java:216) 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException: Job failed! --- I'd appreciate any help with this. Thanks, Leo
omitNorms
Hi, my field category (string) has omitNorms=True and omitTermFreqAndPositions=True. i have indexed all docs but when i do a search like: http://xxx:xxx/solr/select/?q=category:AdebugQuery=on i see there's normalization and idf and tf. Why? i can't understand the reason. 8.676225 = (MATCH) fieldWeight(category:A in 826), product of: 1.0 = tf(termFreq(category:A)=1) 8.676225 = idf(docFreq=6978, maxDocs=15049953) 1.0 = fieldNorm(field=category, doc=826) /str str name=9788805010158 8.676225 = (MATCH) fieldWeight(category:A in 3433), product of: 1.0 = tf(termFreq(category:A)=1) 8.676225 = idf(docFreq=6978, maxDocs=15049953) 1.0 = fieldNorm(field=category, doc=3433) /str str name=9788805010165 8.676225 = (MATCH) fieldWeight(category:A in 3434), product of: 1.0 = tf(termFreq(category:A)=1) 8.676225 = idf(docFreq=6978, maxDocs=15049953) 1.0 = fieldNorm(field=category, doc=3434) category field is stored and indexed. is that the problem? Thank you Gastone
(Solr-UIMA) Indexing problems with UIMA fields.
Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community): I compiled a Solr-UIMA-snapshot, using ant clean dist, by adding my own annotators there. It compiled without any errors. and i obtained a jar file. Now, following the instructions on the readme ( https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt ) I modified my SolrConfig.xml and Schema.xml as suggested in the README. As long as i say required=false on the UIMA generated fields, the indexing works fine...without a UIMA annotation. However, once I say required=true, I get an error: request: http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57) Is thre something during indexing that I need to do apart from saying: UpdateResponse response = server.add(docs); (where docs is a collection of documents, without UIMA indexing.) My understanding is that the UIMA annotation happens after calling the server.add(docs). Is that right? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex
If you're using Solr anyway, you'd better upgrade to Nutch 1.3 with Solr 3.x support. Works like a charm. Thanks, Leo On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote: you need to update the solrj libs to 3.x version. the java bin format has changed . I made the change a few months back, you can pull the changes from https://github.com/geek4377/nutch/tree/geek5377-1.2.1 hope that helps, On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions llsub...@zudiewiener.com wrote: I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have added the nutch schema and can access/view this schema via the admin page. nutch also works as I can perfrom successful searches. When I execute the following: ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb crawl/linkdb crawl/segments/* I (eventually) get an io error. Tha above command creates the following files /var/lib/tomcat6/solr/core0/data/index/ --- 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 _1.fdx 4 -rw-r--r-- 1 tomcat6 tomcat6 32 2011-07-13 10:59 segments_2 4 -rw-r--r-- 1 tomcat6 tomcat6 20 2011-07-13 10:59 segments.gen 0 -rw-r--r-- 1 tomcat6 tomcat6 0 2011-07-13 11:00 write.lock --- but the hadoop.log reports the following error --- 2011-07-13 11:09:47,665 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-07-13 11:09:47,666 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: content dest: content 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: site dest: site 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: title dest: title 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: host dest: host 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: segment dest: segment 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: boost dest: boost 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: digest dest: digest 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: id 2011-07-13 11:09:47,690 INFO solr.SolrMappingReader - source: url dest: url 2011-07-13 11:09:49,272 WARN mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99 ) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse( BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons HttpSolrServer.java:466) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons HttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abst ractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:54) at org.apache.nutch.indexer.IndexerOutputFormat $1.write(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask $3.collect(ReduceTask.java:440) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java: 159) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java: 50) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner $Job.run(LocalJobRunner.java:216) 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException: Job failed! --- --- - I'd appreciate any help with this. Thanks, Leo
Geo search with spatial-solr-plugin
Hello, Spatial-solr-2.0-RC5.jar works successfully with Solr-1.4.1. With release of solr-3.1, is the support for spatial-solr-plaugin going to continue or not? Thanks! Isha
Re: how to build lucene-solr (espeically if behind a firewall)?
If behind proxy; then use: ant dist ${build_files:autoproxy} Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-build-lucene-solr-espeically-if-behind-a-firewall-tp3163038p3165568.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: POST for queries, length/complexity limit of fq?
1. I assume that it's worthwhile to rely on POST method instead of GET when issuing a search. Right? As I can see, this should work. We do restrict users search by passing unique id's(sometimes in thousands) in 'fq' and use POST method Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I specify a different analyzer at search-time?
You can configure analyzer for 'index-time' for 'search-time' for each of your field-types in schema.xml Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-specify-a-different-analyzer-at-search-time-tp3159463p3165593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping / Collapse Query
Could you just return the score with the documents, group by type and order them any way you wanted? Best Erick On Tue, Jul 12, 2011 at 9:36 PM, entdeveloper cameron.develo...@gmail.com wrote: I'm messing around with the field collapsing in 4.x http://wiki.apache.org/solr/FieldCollapsing . Is it currently possible to group by a field with a certain value only and leave all the others ungrouped using the group.query param? This currently doesn't seem to work the way I want it to. For example, I have documents all with a type field. Possible values are: picture, video, game, other. I want to only group the pictures, and leave all other documents ungrouped. If I query something like: q=dogsgroup=truegroup.query=type:picture I ONLY get pictures back. Seems like this behaves more like an 'fq' What I want is a result set that looks like this: 1. doc 1, type=video 2. doc 2, type=game 3. doc 3, type=picture, + 3 other pictures 4. doc 4, type=video 5. doc 5, type=video ... I've also tried: q=dogsgroup=truegroup.query=type:picturegroup.query=-type:video -type:game But this doesn't work because the order of the groups don't put together the correct order of results that would be displayed. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-Collapse-Query-tp3164433p3164433.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I still search documents once updated?
Wait, you directly contradicted yourself G You say it's not stored, then you say it's stored and indexed, which is it? When you fetch a document, only stored fields are returned and the returned data is the verbatim copy of the original data. No attempt is made to return un-stored fields. This has been the behavior allways. If you attempted to returned indexed but not stored data, you'd get stemmed versions, stop words would be removed, synonyms would be in place etc. Not to mention it would be very slow. If the field is stored, then there's another problem, you might want to dump the document after reading it from the IR. Best Erick On Wed, Jul 13, 2011 at 2:25 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: It indeed is not stored, but this is still unexpected behavior. It's a stored and indexed field, why has the index data been lost? On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson erickerick...@gmail.comwrote: Unless you stored your content field, the value you put in there won't be fetched from the index. Verify that the doc you retrieve from the index has values for content, I bet it doesn't Best Erick On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdateLoseTermsSimplified() throws Exception { * IndexWriter writer = indexDoc();* assertEquals(1, writer.numDocs()); IndexSearcher searcher = getSearcher(writer); final TermQuery termQuery = new TermQuery(new Term(content, essen)); TopDocs docs = searcher.search(termQuery, 1); assertEquals(1, docs.totalHits); Document doc = searcher.doc(0); * writer.updateDocument(new Term(id,doc.get(id)),doc);* searcher = getSearcher(writer); * docs = searcher.search(termQuery, 1);* * assertEquals(1, docs.totalHits);*//docs.totalHits == 0 ! } testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest) Time elapsed: 0.346 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271) I have not changed anything (as you can see) during the update. I just retrieve a document and the update it. But then the termQuery that worked before doesn't work anymore (while the id field wasn't changed). Is this to be expected when content field is not stored? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: (Solr-UIMA) Indexing problems with UIMA fields.
If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields. Best Erick On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community): I compiled a Solr-UIMA-snapshot, using ant clean dist, by adding my own annotators there. It compiled without any errors. and i obtained a jar file. Now, following the instructions on the readme ( https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt ) I modified my SolrConfig.xml and Schema.xml as suggested in the README. As long as i say required=false on the UIMA generated fields, the indexing works fine...without a UIMA annotation. However, once I say required=true, I get an error: request: http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57) Is thre something during indexing that I need to do apart from saying: UpdateResponse response = server.add(docs); (where docs is a collection of documents, without UIMA indexing.) My understanding is that the UIMA annotation happens after calling the server.add(docs). Is that right? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Can I still search documents once updated?
On Wed, Jul 13, 2011 at 1:57 PM, Erick Erickson erickerick...@gmail.comwrote: Wait, you directly contradicted yourself G You say it's not stored, then you say it's stored and indexed, which is it? ja, i meant indexed and not stored. When you fetch a document, only stored fields are returned and the returned data is the verbatim copy of the original data. No attempt is made to return un-stored fields. This has been the behavior allways. If you attempted to returned indexed but not stored data, you'd get stemmed versions, stop words would be removed, synonyms would be in place etc. Not to mention it would be very slow. this is what i was expecting. Otherwise updating a field of a document that has an unstored but indexed field is impossible (without losing the unstored but indexed field. I call this updating a field of a document AND deleting/updating all its unstored but indexed fields). If the field is stored, then there's another problem, you might want to dump the document after reading it from the IR. Best Erick On Wed, Jul 13, 2011 at 2:25 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: It indeed is not stored, but this is still unexpected behavior. It's a stored and indexed field, why has the index data been lost? On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson erickerick...@gmail.comwrote: Unless you stored your content field, the value you put in there won't be fetched from the index. Verify that the doc you retrieve from the index has values for content, I bet it doesn't Best Erick On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdateLoseTermsSimplified() throws Exception { *IndexWriter writer = indexDoc();* assertEquals(1, writer.numDocs()); IndexSearcher searcher = getSearcher(writer); final TermQuery termQuery = new TermQuery(new Term(content, essen)); TopDocs docs = searcher.search(termQuery, 1); assertEquals(1, docs.totalHits); Document doc = searcher.doc(0); *writer.updateDocument(new Term(id,doc.get(id)),doc);* searcher = getSearcher(writer); *docs = searcher.search(termQuery, 1);* *assertEquals(1, docs.totalHits);*//docs.totalHits == 0 ! } testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest) Time elapsed: 0.346 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271) I have not changed anything (as you can see) during the update. I just retrieve a document and the update it. But then the termQuery that worked before doesn't work anymore (while the id field wasn't changed). Is this to be expected when content field is not stored? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: (Solr-UIMA) Indexing problems with UIMA fields.
Hi Eric* If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields.* - But, the UIMA pipeline is supposed to add the missing fields for the document. Since ant clean dist compiled without build errors, and it was essentially the same pipeline I already used before on a different indexer, I can say that there is no problem with the Pipeline as such. That again gets back my other query: While indexing, should I mention something else, apart from just saying: Something like: doc1.addfield(A) doc1.addfield(B) docs.add(doc1) docN.addfield(A) docN.addfield(B) docs.add(docN) UpdateResponse response = server.add(docs) - My understanding was that: the UIMAProcessor runs after I say server.add()... inside the updateprocessor. Is it not so? S On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.comwrote: If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields. Best Erick On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community): I compiled a Solr-UIMA-snapshot, using ant clean dist, by adding my own annotators there. It compiled without any errors. and i obtained a jar file. Now, following the instructions on the readme ( https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt ) I modified my SolrConfig.xml and Schema.xml as suggested in the README. As long as i say required=false on the UIMA generated fields, the indexing works fine...without a UIMA annotation. However, once I say required=true, I get an error: request: http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57) Is thre something during indexing that I need to do apart from saying: UpdateResponse response = server.add(docs); (where docs is a collection of documents, without UIMA indexing.) My understanding is that the UIMA annotation happens after calling the server.add(docs). Is that right? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: (Solr-UIMA) Indexing problems with UIMA fields.
I'll have to punt here. I don't know the internals well enough to say. I suppose it's possible that the required fields check happens *before* the UIMA stuff happens, but since I know so little about UIMA that's a blind guess at best... Anyone with real knowledge want to chime in here? Erick On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Eric* If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields.* - But, the UIMA pipeline is supposed to add the missing fields for the document. Since ant clean dist compiled without build errors, and it was essentially the same pipeline I already used before on a different indexer, I can say that there is no problem with the Pipeline as such. That again gets back my other query: While indexing, should I mention something else, apart from just saying: Something like: doc1.addfield(A) doc1.addfield(B) docs.add(doc1) docN.addfield(A) docN.addfield(B) docs.add(docN) UpdateResponse response = server.add(docs) - My understanding was that: the UIMAProcessor runs after I say server.add()... inside the updateprocessor. Is it not so? S On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.comwrote: If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields. Best Erick On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community): I compiled a Solr-UIMA-snapshot, using ant clean dist, by adding my own annotators there. It compiled without any errors. and i obtained a jar file. Now, following the instructions on the readme ( https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt ) I modified my SolrConfig.xml and Schema.xml as suggested in the README. As long as i say required=false on the UIMA generated fields, the indexing works fine...without a UIMA annotation. However, once I say required=true, I get an error: request: http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57) Is thre something during indexing that I need to do apart from saying: UpdateResponse response = server.add(docs); (where docs is a collection of documents, without UIMA indexing.) My understanding is that the UIMA annotation happens after calling the server.add(docs). Is that right? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Solr versioning policy
Hi, I've noticed that since the 3.1 release new minor version releases have been happening about every two months. I have a couple of questions: 1. Is this the plan moving forward (to aim for a new minor release approximately every couple of months)? 2. Will minor version increases always be backwards compatible (so I could upgrade from 3.x to 3.y where y x without having to update the schema/config or rebuild the indexes)? It might be worth sticking something up on the wiki which gives an overview of the versioning policy just to clarify things. (I had a look and couldn't find anything.) Cheers, Mike.
Re: Can I still search documents once updated?
Am 13.07.2011 14:05, schrieb Gabriele Kahlout: this is what i was expecting. Otherwise updating a field of a document that has an unstored but indexed field is impossible (without losing the unstored but indexed field. I call this updating a field of a document AND deleting/updating all its unstored but indexed fields). Not necessarily. The usual use case is that you have some kind of existing data source from where you fill your Solr index. When you want to update field of a document, then you simply re-index from that source. There's no need to fetch data from Solr before. Otherwise, if you really don't have such an existing data source because a horde of typewriting monkeys filled your Solr index, then you should better declare all your fields as stored. Otherwise you'll never have a chance to get that data back. Greeting, Kuli
Re: Can I still search documents once updated?
Well, I'm !sure how usual this scenario would be: 1. In general those using solr with nutch don't store the content field to avoid storing the whole web/intranet in their index, twice (1 in the form of stored data, and one in the form of indexed data). Now everytime they need to update a field unrelated to content (number of inbound links for an example) they would have to re-crawl the page again. This is at least !intuitive. On Wed, Jul 13, 2011 at 2:40 PM, Michael Kuhlmann s...@kuli.org wrote: Am 13.07.2011 14:05, schrieb Gabriele Kahlout: this is what i was expecting. Otherwise updating a field of a document that has an unstored but indexed field is impossible (without losing the unstored but indexed field. I call this updating a field of a document AND deleting/updating all its unstored but indexed fields). Not necessarily. The usual use case is that you have some kind of existing data source from where you fill your Solr index. When you want to update field of a document, then you simply re-index from that source. There's no need to fetch data from Solr before. Otherwise, if you really don't have such an existing data source because a horde of typewriting monkeys filled your Solr index, then you should better declare all your fields as stored. Otherwise you'll never have a chance to get that data back. Greeting, Kuli -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
Am 13.07.2011 15:37, schrieb Gabriele Kahlout: Well, I'm !sure how usual this scenario would be: 1. In general those using solr with nutch don't store the content field to avoid storing the whole web/intranet in their index, twice (1 in the form of stored data, and one in the form of indexed data). Not exactly. The indexed form is quite different from the stored form; only the tokens are stored, each token only once, and some additional data like the document count and, maybe, shingle information etc.. Hence, indexed data usually needs much less space on disk than the original data. There's no practical alternative to storing the content in a stored field. What would you otherwise display as a search result? The following web pages have your search term somewhere in their contents, don't know where, take a look on your own? Greetings, Kuli
Re: Can I still search documents once updated?
On Wed, Jul 13, 2011 at 3:54 PM, Michael Kuhlmann s...@kuli.org wrote: Am 13.07.2011 15:37, schrieb Gabriele Kahlout: Well, I'm !sure how usual this scenario would be: 1. In general those using solr with nutch don't store the content field to avoid storing the whole web/intranet in their index, twice (1 in the form of stored data, and one in the form of indexed data). Not exactly. The indexed form is quite different from the stored form; only the tokens are stored, each token only once, and some additional data like the document count and, maybe, shingle information etc.. Hence, indexed data usually needs much less space on disk than the original data. I realized that. Maybe I should have said 1.X (1 in the form of stored data and 0.X in the form of indexed data). There's no practical alternative to storing the content in a stored field. What would you otherwise display as a search result? The following web pages have your search term somewhere in their contents, don't know where, take a look on your own? Display the title, and url (and implicitly say The following web pages have your search term somewhere in their contents, don't REMEMBER where, take a look on your own?). Solr is already configured by default not to store more than a maxFieldLength anyway. Usually one stores content only to display snippets. Greetings, Kuli -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to add TrieIntField to a SolrInputDocument?
SolrInputDocument doc = new SolrInputDocument(); doc.setField(id, 0); doc.setField(url, getURL(0)); doc.setField(content, blah blah blah); *doc.setField(wc, 150); //wc is of solr.TrieIntField field type in schema.xml* assertU(adoc(doc)); assertU(commit()); assertNumFound(1); The above test fails until I change the following in schema.xml: - fieldType name=int class=solr.*TrieIntField* omitNorms=true/ + fieldType name=int class=solr.*IntField* omitNorms=true/ On Sun, Jul 10, 2011 at 10:36 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: This was my problem: fieldType name=int class=solr.TrieIntField omitNorms=true/ I had taken my queu from Nutch's schema: fieldType name=long class=solr.LongField omitNorms=true/ On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley yo...@lucidimagination.comwrote: Something is wrong with your indexing. Is wc an indexed field? If not, change it so it is, then re-index your data. If so, I'd recommend starting with the example data and filter for something like popularity:[6 TO 10] to convince yourself it works, then figuring out what you did differently in your schema/data. -Yonik http://www.lucidimagination.com On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
Am 13.07.2011 16:09, schrieb Gabriele Kahlout: Solr is already configured by default not to store more than a maxFieldLength anyway. Usually one stores content only to display snippets. Yes, but the snippets must come from somewhere. For instance, if you're using Solr's highlighting feature, all highlighted fields must be stored. See http://www.intellog.com/blog/?p=208 for explanation from someone else. ;) Greetings, Kuli
about standard analyzer
hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com
Using Solr for searching in a social network
Hi everyone, I'm building a social network site and I need to build a search module, which is someway similar to Facebook Search. Say this module can search people by their names, based on the following priority levels: - My friends: has the 1st priority (highest) - Friends of my friends, or anyone who is someway related to me: has the 2nd priority - Everyone else on the network: has the 3rd priority (lowest) The number of user will grow very big, so I cannot flatten data of users along with their friends and importing all of them into Solr. Can Solr help me to solve this problem ? If I have all necessary services to get friend list of a user (the 1st priority), or get list of friends of my friends (the 2nd priority), can Solr use these external data sources for searching? Please help me. Thanks and regards, Nguyen Trung Kien
about standardAnaylzer in solr
hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene. analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com (which all belong to ALPHANUM ) when using Lucene, i got this StandardAnalyzer: 1: [xyz:0-4:COMPANY] 2: [corporation:5-16:ALPHANUM] 3: [x...@example.com:19-34:EMAIL] so my question is, how to make it analysis like in Lucene ? regards, kiwi
Can we use crawled data by Nutch 0.9 in other versions of Nutch
Hello, I have a question and I apologize if it sounds stupid. I just want to know, if we can use the crawled data by Nutch 0.9 in Nutch 1.3 because search has been delegated to Solr in Nutch 1.3 and I want to get the search results from the crawled data by Nutch 0.9 in Nutch 1.3 Serenity
Re: (Solr-UIMA) Indexing problems with UIMA fields.
Hello, I think the problem might be the following, if you defined the update request handlers like in the sample solrconfig : updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig /updateRequestProcessorChain requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainuima/str /lst /requestHandler ... requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / then the uima update chain will be executed only for HTTP POSTs on /update and not for /update/javabin (that is used by SolrJ), so you may need to update the /update/javabin configuration as follows: requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler lst name=defaults str name=update.chainuima/str /lst /requestHandler Hope this helps, Tommaso 2011/7/13 Erick Erickson erickerick...@gmail.com I'll have to punt here. I don't know the internals well enough to say. I suppose it's possible that the required fields check happens *before* the UIMA stuff happens, but since I know so little about UIMA that's a blind guess at best... Anyone with real knowledge want to chime in here? Erick On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Eric* If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields.* - But, the UIMA pipeline is supposed to add the missing fields for the document. Since ant clean dist compiled without build errors, and it was essentially the same pipeline I already used before on a different indexer, I can say that there is no problem with the Pipeline as such. That again gets back my other query: While indexing, should I mention something else, apart from just saying: Something like: doc1.addfield(A) doc1.addfield(B) docs.add(doc1) docN.addfield(A) docN.addfield(B) docs.add(docN) UpdateResponse response = server.add(docs) - My understanding was that: the UIMAProcessor runs after I say server.add()... inside the updateprocessor. Is it not so? S On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.com wrote: If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields. Best Erick On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community): I compiled a Solr-UIMA-snapshot, using ant clean dist, by adding my own annotators there. It compiled without any errors. and i obtained a jar file. Now, following the instructions on the readme ( https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt ) I modified my SolrConfig.xml and Schema.xml as suggested in the README. As long as i say required=false on the UIMA generated fields, the indexing works fine...without a UIMA annotation. However, once I say required=true, I get an error: request: http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57) Is thre something during indexing that I need to do apart from saying: UpdateResponse response = server.add(docs); (where docs is a collection of documents, without UIMA indexing.) My understanding is that the UIMA annotation happens after calling the server.add(docs). Is that right? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy!
Re: about standard analyzer
You're probably seeing the effects of a number of the filters that Solr is applying (see the fieldType definition). In particular, this looks like the result of WordDelimiterFilterFactory. If you click the verbose box on the analysis page, you should see the results of each step in the analysis chain. Best Erick On Wed, Jul 13, 2011 at 10:36 AM, Kiwi de coder kiwio...@gmail.com wrote: hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com
Re: Can we use crawled data by Nutch 0.9 in other versions of Nutch
You're on the wrong side of the fence. Anyway, you need to get Nutch 1.1 first as it has a CrawlDB converter. Convert your 0.9 CrawlDB first with Nutch 1.1, then upgrade to 1.3. On Wednesday 13 July 2011 16:48:24 serenity keningston wrote: Hello, I have a question and I apologize if it sounds stupid. I just want to know, if we can use the crawled data by Nutch 0.9 in Nutch 1.3 because search has been delegated to Solr in Nutch 1.3 and I want to get the search results from the crawled data by Nutch 0.9 in Nutch 1.3 Serenity -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: about standard analyzer
hi, sorry.. i accidentally sending out this uncomplete mail.. actually I had send another one, please ignore this thx :) kiwi On Wed, Jul 13, 2011 at 10:52 PM, Erick Erickson erickerick...@gmail.comwrote: You're probably seeing the effects of a number of the filters that Solr is applying (see the fieldType definition). In particular, this looks like the result of WordDelimiterFilterFactory. If you click the verbose box on the analysis page, you should see the results of each step in the analysis chain. Best Erick On Wed, Jul 13, 2011 at 10:36 AM, Kiwi de coder kiwio...@gmail.com wrote: hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com
Re: Index Solr Logs
Im also interested in this. Noone has ever tried to index solr log with DIH? Or is there some way to store solr logs in mysql db? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-Solr-Logs-tp3109956p3166302.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: about standardAnaylzer in solr
Try to change from StandardTolkenizerFactory to ClassicTokenizerFactory or create your own fieldType fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index *tokenizer class=solr.ClassicTokenizerFactory/* ... /fieldType Edo On Wed, Jul 13, 2011 at 3:40 PM, Kiwi de coder kiwio...@gmail.com wrote: hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene. analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com (which all belong to ALPHANUM ) when using Lucene, i got this StandardAnalyzer: 1: [xyz:0-4:COMPANY] 2: [corporation:5-16:ALPHANUM] 3: [x...@example.com:19-34:EMAIL] so my question is, how to make it analysis like in Lucene ? regards, kiwi -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: about standardAnaylzer in solr
ok, work now ! thx :) kiwi On Wed, Jul 13, 2011 at 11:06 PM, Edoardo Tosca e.to...@sourcesense.comwrote: Try to change from StandardTolkenizerFactory to ClassicTokenizerFactory or create your own fieldType fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index *tokenizer class=solr.ClassicTokenizerFactory/* ... /fieldType Edo On Wed, Jul 13, 2011 at 3:40 PM, Kiwi de coder kiwio...@gmail.com wrote: hi, I using solr 3.3 which in schema.xml contain this : fieldType name=text_standard class=solr.TextField analyzer class=org.apache.lucene. analysis.standard.StandardAnalyzer/ /fieldType i use the sentences as example XYZ Corporation - x...@example.com however, when I try on /analysis.jsp, it show difference result compare to using Lucene. using solr I got result below when using text_standard and text_general (is both the same ?) XYZCorporationxyzexample.com (which all belong to ALPHANUM ) when using Lucene, i got this StandardAnalyzer: 1: [xyz:0-4:COMPANY] 2: [corporation:5-16:ALPHANUM] 3: [x...@example.com:19-34:EMAIL] so my question is, how to make it analysis like in Lucene ? regards, kiwi -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: (Solr-UIMA) Indexing problems with UIMA fields.
Hello Tomasso Thanks for the reply. I did added uima chain to /javabin handler as you suggested. Now, I get an internal server error! Here is the stacktrace. request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150) Now, I began tracing back from the instructional README.txt file. A few doubts: 1) copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory. or set lib/ tags in solrconfig.xml appropriately to point those jar files. lib dir=../../contrib/uima/lib / lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar / -Which Solr Libraries directory does this refer to?? Does it refer to the Lib director inside the WEB-INF folder of the Solr webapp? 2) lst name=analyzeFields bool name=mergefalse/bool arr name=fields strtext/str /arr /lst -the only field I need to send through the pipeline is text field. Is it enough if I specify that inside SolrConfig in this point...or should I do something more? 3) Where can I see a more detailed Log about what is happening inside Solr? I am running Solr from Eclipse + Tomcat. Neither the Console nor the Eclipse Tomcat log show me a detailed errorlog. S On Wed, Jul 13, 2011 at 4:48 PM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Hello, I think the problem might be the following, if you defined the update request handlers like in the sample solrconfig : updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig /updateRequestProcessorChain requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainuima/str /lst /requestHandler ... requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / then the uima update chain will be executed only for HTTP POSTs on /update and not for /update/javabin (that is used by SolrJ), so you may need to update the /update/javabin configuration as follows: requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler lst name=defaults str name=update.chainuima/str /lst /requestHandler Hope this helps, Tommaso 2011/7/13 Erick Erickson erickerick...@gmail.com I'll have to punt here. I don't know the internals well enough to say. I suppose it's possible that the required fields check happens *before* the UIMA stuff happens, but since I know so little about UIMA that's a blind guess at best... Anyone with real knowledge want to chime in here? Erick On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Eric* If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields.* - But, the UIMA pipeline is supposed to add the missing fields for the document. Since ant clean dist compiled without build errors, and it was essentially the same pipeline I already used before on a different indexer, I can say that there is no problem with the Pipeline as such. That again gets back my other query: While indexing, should I mention something else, apart from just saying: Something like: doc1.addfield(A) doc1.addfield(B) docs.add(doc1) docN.addfield(A) docN.addfield(B) docs.add(docN) UpdateResponse response = server.add(docs) - My understanding was that: the UIMAProcessor runs after I say server.add()... inside the updateprocessor. Is it not so? S On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.com wrote: If I'm reading this right, you're labeling certain fields as required. All docs MUST have those fields (I admit the error message could be more informative). So it sounds like things are behaving as I'd expect, your documents just don't contain the required fields. Best Erick On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I have a problem making the indexer work with the UIMA fields. Here is what I did (With the help of this community):
Re: Index Solr Logs
On Wed, Jul 13, 2011 at 8:31 PM, O. Klein kl...@octoweb.nl wrote: Im also interested in this. Noone has ever tried to index solr log with DIH? [...] Just upstream in this thread, Mike pointed out Logg.ly: http://www.loggly.com/ Regards, Gora
Multivalued field: get only the str that matches with the query
Hi all, I have a multivalued field. I need to make a search in the multivalued field and get only the values that match with the query. Example: doc field name=textaaa bbb ccc/field field name=textaaa/field field name=textccc/field /doc So, if I make a search like text:aaa the 3 values are returned, when only results one and two are the correct. I am using the WhitespaceTokenizer in the index and in the query analyzers: types fieldtype name=string class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldtype /types How to do that on Apache Solr? Thanks!
Preserve XML hierarchy
Hi, is it possible to do that in Apache Solr? If i make a search, how I know from where it comes the result? Thanks! I have an XML like this: some Text/ another text/ text/ more text/ text/ / / / / some Text/ text/ / / some Text/ another text/ text/ more text/ text/ / text/ text/ / / / / / /
Wildcard
Hello, What are wildcards we can use with the SOLR ? Regards, Gaurav
Re: Wildcard
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html http://wiki.apache.org/solr/SolrQuerySyntax François On Jul 13, 2011, at 1:29 PM, GAURAV PAREEK wrote: Hello, What are wildcards we can use with the SOLR ? Regards, Gaurav
Re: Grouping / Collapse Query
I guess that's a possible solution, but the two concerns I would have are 1) putting the burden of sorting on the client instead of solr, where it belongs. And 2) needing to request more results than I'd want to display in order to guarantee I could populate the entire page of results to compensate for the grouping. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-Collapse-Query-tp3164433p3166789.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preserve XML hierarchy
On Wed, Jul 13, 2011 at 10:30 PM, Lucas Miguez lucas.mig...@gmail.com wrote: Hi, is it possible to do that in Apache Solr? If i make a search, how I know from where it comes the result? [...] Your question is not very clear, and I happen unfortunately to be out of crystal balls and Tarot cards. Is it possible to do what? Make a search on what, and what sort of results do you you expect from said search? Peering into the misty depths of my non-existent crystal ball, if you are asking is it possible to index an XML file, search it, and figure out which node of the XML the search result comes from, yes that is possible; though details, and better advice would require more input from your side. Roughly speaking, each node can go into a separate Solr field, and full-text search on all relevant fields is also possible. Joking aside, please do provide more details. Regards, Gora
Re: ContentStreamLoader Problem
On 07/12/2011 6:52 PM, Erick Erickson wrote: This is a shot in the dark, but this smells like a classpath issue, and since you have a 1.4.1 installation on the machine, I'm *guessing* that you're getting a mix of old and new Jars. What happens if you try this on a machine that doesn't have 1.4.1 on it? If that works, then it's likely a classpath issue Best Erick I'll give it a shot and report back. Thanks - Tod
SolrCloud Shardding
Reading the SolrCloud wiki I see that there are goals to support different shardding algorithms, what is currently implemented today? Is the shardding logic the responsibility of the application doing the index?
omitTermFreq only?
Hello, I was wondering if there is a way we can omit only the Term Frequency in solr? omitTermFreqAndPositions =true wouldn't work for us since we need the positions for supporting phrase queries. Thanks, -Jibo
omitTermFreq only?
Hello, I was wondering if there is a way we can omit only the Term Frequency in solr? omitTermFreqAndPositions =true wouldn't work for us since we need the positions for supporting phrase queries. Thanks, -Jibo
extending edismax?
I am using Solr 3.3. I am using the edismax query parser and I am getting great results. To improve relevancy I want to add some semantic filters to the query. E.g. I want to pass the query red shoes as q=shoesfq=color:red. I have a service that can tell me that in the phrase red shoes the word red is the color. My question is where should I invoke this external service, 1) should my search client call the service, form the request and then call Solr or 2) should I pass the query as is to Solr and have Solr call the service internally. 1 is easier for me as I am familiar with the client code, 2 would be harder. I wanted to know what the best practices are. I am happy with edismax so I want to reuse all its functionality, so can I write a custom handler that calls my service and then hands the request over to edismax? Thanks for your time.
deletedPkQuery fails
Hi Folks, I am trying to use the deletedPkQuery to enable deltaImport to remove the inactive products from solr. I am keeping getting the syntax error saying the query syntax is not right. I have tried many alternatives to the following query. Although all of them work in the mysql prompt directly, no one works in solr handler. Can anyone give me some hint to debug this type of problem? Is there anything special about deletedPkQuery I am not aware of? deletedPkQuery=select p.pId as id from products p join products_large pl on p.pId=pl.pId where p.pId= ${dataimporter.delta.id} and pl.deleted='' having count(*)=0 Jul 13, 2011 4:02:23 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select p.pId as id from products p join products_large pl on p.pId=pl.pI d where p.pId= and pl.deleted='' having count(*)=0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextDeletedRowKey(SqlEntityProcessor.java:91) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextDeletedRowKey(EntityProcessorWrapper.java:258) at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:636) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:258) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL serv er version for the right syntax to use near 'and pl.deleted='' having count(*)=0' at line 1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3603) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3535) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1989) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2150) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2570) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:779) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:622) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246) Elaine
Re: solr/velocity: funtion for sorting asc/desc
Thanks Eric. So if I had a link Sort Title and the default is sort=title desc how can i switch that to sort=title asc? example: http://# Sort Tile (default sort=title desc) user clicks on link and sort should toggle(or switch) to sort=title asc how can this be achieved? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: omitTermFreq only?
A dirty hack is to return 1.0f for each tf 0. Just a couple of lines code for a custom similarity class. Hello, I was wondering if there is a way we can omit only the Term Frequency in solr? omitTermFreqAndPositions =true wouldn't work for us since we need the positions for supporting phrase queries. Thanks, -Jibo
Re: solr/velocity: funtion for sorting asc/desc
You'll have to add some logic in your Velocity templates string process the sort parameter and determine whether to set the link to be ascending or descending. It'll require learning some Velocity techniques to do this with #if and how to navigate the objects Solr puts into the Velocity context. You'll find pointers to more information here: http://wiki.apache.org/solr/VelocityResponseWriter I know this doesn't quite get you there without you doing the homework, but maybe there'll be enough pointers in that wiki page to get you there. I've done this in various one-off cases, but nothing handy to share at the moment (I'm traveling), but maybe in a week or so I'll be able to dig something out or re-craft the voodoo. Erik On Jul 13, 2011, at 14:23 , okayndc wrote: Thanks Eric. So if I had a link Sort Title and the default is sort=title desc how can i switch that to sort=title asc? example: http://# Sort Tile (default sort=title desc) user clicks on link and sort should toggle(or switch) to sort=title asc how can this be achieved? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167267.html Sent from the Solr - User mailing list archive at Nabble.com.
Test document generator (in PHP)
Hi! I needed to fill an index with content and created myself a test document generator. It generates random text documents based on english terms. The content doesn't make sense, but allows for testing of some search features. Each document is unique. Document length is configured to vary within a range. Right now it's pretty slow (100 documents take 4 minutes on my machine), but for me it works. If you want to use it yourself, get it here: https://github.com/marians/php-solr-testdoc-generator Cheers, Marian
Re: omitTermFreq only?
Sorry I should have made the objectives clear. The goal is to reduce the index size by avoiding TermFrequency stored in the index (in .frq segment files). After exploring a bit more, realized that LUCENE-2048 now allows omitPositions. Similarly, I'm looking for a omitFrequency option. Thanks, -Jibo On Jul 13, 2011, at 1:34 PM, Markus Jelsma wrote: A dirty hack is to return 1.0f for each tf 0. Just a couple of lines code for a custom similarity class. Hello, I was wondering if there is a way we can omit only the Term Frequency in solr? omitTermFreqAndPositions =true wouldn't work for us since we need the positions for supporting phrase queries. Thanks, -Jibo
Upgrading solr from 1.4 to latest version
Hi, I am new to Solr. In little time, I am very much impressed with its search performance. I have installed Solr on Ubuntu using *apt-get install solr-tomcat curl -y* command. From admin page, I can see that solr version is 1.4.1. But i see there is 3.x version already available. Just wondering if there is any easy way to upgrade it to latest version. Tried specifying version number in apt-get, But it does not work. Appreciate your help. Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrading-solr-from-1-4-to-latest-version-tp3164312p3164312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Rewrite
Ok, so I think I have something working relatively well. I have a few issues which I'm not sure how to address though. Currently when I create my ParserPlugin in solrconfig I do the following queryParser name=alias class=org.apache.solr.search.CustomQueryParserPlugin str name=person_name person_name^1.0 person_name_first^0.5 person_name_last^0.5 /str /queryParser then in the CustomQueryParser I iterate over all the arguments adding each key/value to a Map. I then pass in this to the constructor of a basically copied ExtendedDismaxQParser (only difference is the added aliases and the logic to add those to the ExtendedSolrQParser). Now, the thing I hate about this is I had to pull pieces into my own class since some of the methods being called (in QueryUtils for instance) are not publicly available. Also I didn't expose parameters to allow this to be done per query level, but I'd most certainly take suggestions on how to do this. My use case doesn't require it but I'd be happy to make the modifications and try it out. On Tue, Jul 12, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com wrote: I'm not following where the aliasing feature I'm looking for is. Looking at the patch I didn't see it either. Essentially what I'm looking for is when a user searches for person_name that the query turns into person_name:john OR person_name_first:john OR person_name_last:john. I don't see anything like that here, am I just missing it? On Tue, Jul 12, 2011 at 3:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Thanks Hoss. I'm not really sure where to begin looking with this, I : quickly read the JIRA but don't see mention of exposing the multiple : aliases. Can you provide any more details? i refered to it as uf or user fields ... note the specific comment i linked to in the first url, and the subsequent patch the colon bug in edismax is what hung me up at the time. : : On Tue, Jul 12, 2011 at 1:19 PM, Chris Hostetter : hossman_luc...@fucit.org wrote: : : Taking a closer look at this it seems as if the : : DisjunctionMaxQueryParser supports doing multiple aliases and : : generating multiple queries, I didn't see this same capability in the : : ExtendedDismaxQParser, am I just missing it? If this capability were : : it's never been exposed at a user level ... i started looking at adding it : to edismax but ran into a bug i couldn't uncover in the time i had to work : on it (which i think has since been fixed)... : : https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12839892page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12839892 : https://issues.apache.org/jira/browse/SOLR-2409 : https://issues.apache.org/jira/browse/SOLR-2368 : : : -Hoss : : -Hoss
Re: Exception using Analyze from the Solr Admin app
I had this same problem. In my case, I was able to correct it by removing (perhaps safer to move instead of remove) my tomcat-dir/work/Catalina/hostname directory. Restarting tomcat caused that directory to be recreated, and after that it worked for me. Hope this helps anyone else who runs into this in the future. -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-using-Analyze-from-the-Solr-Admin-app-tp3060464p3167220.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: use query to set filter on fields
Use the following query: /select?q=*:*rows=0wt=jsonindent=truefl=NOT_IMPORTANTfacet=truefacet.zeros=falsefq={!tag=mk q.op=OR}make:(Ford red 2010)fq={!tag=cl q.op=OR}color:(Ford red 2010)fq={!tag=yr q.op=OR}year:(Ford red 2010)facet.field={!ex=cl,yr key=mk}makefacet.field={!ex=mk,yr key=cl}colorfacet.field={!ex=mk,cl key=yr}year The mk facet will tell you what makes are found in your free text search, the cl will tell your the colors and yr will tell you the years. This approach has the advantage of using all the synonyms defined for your text field. -- View this message in context: http://lucene.472066.n3.nabble.com/use-query-to-set-filter-on-fields-tp2190595p3167444.html Sent from the Solr - User mailing list archive at Nabble.com.
podcasts
Hello, Are there any good podcasts for beginners in SOLR Thanks Moumou
Edismax parser and complex boolean queries
I was wondering if anyone had details about what the edismax parser does to complex boolean queries?
Re: podcasts
On Jul 13, 2011, at 15:34 , Mourad K wrote: Are there any good podcasts for beginners in SOLR There's a bunch of stuff we've created and posted to our site here: http://www.lucidimagination.com/devzone/videos-podcasts Erik
Re: Edismax parser and complex boolean queries
this may have been a silly question, looking at https://issues.apache.org/jira/browse/SOLR-1553, it says that in the absence of syntax errors the full query syntax is supported. Still would be nice to confirm though. On Wed, Jul 13, 2011 at 6:18 PM, Jamie Johnson jej2...@gmail.com wrote: I was wondering if anyone had details about what the edismax parser does to complex boolean queries?
XInclude Multiple Elements
I've spent some time looking at various conversations on this problem, but I can't find a solution that works. XInclude has to point a valid XML document, with a single root element. It should be possible to use xpointer to specify children elements to include, but as far as I can tell, the xpointer support doesn't include any scheme complex enough to express all the child elements of a given element, which is what I would like. So, here's what I have (it's more complicated than necessary for this example because I also want it to support includes for both the root level and a sub-level to use when doing schema, as I want to do particular fields, but not all of them in the include file): includes.dtd !ELEMENT includes (include+) !ELEMENT include ANY !ATTLIST include id ID #REQUIRED solrconfigIncludes.xml ?xml version=1.0 encoding=UTF-8? !DOCTYPE includes SYSTEM includes.dtd includes include id=root ...several elements to be included at the root of a solrconfig.xml file... /include /includes solrconfig.xml ?xml version=1.0 encoding=UTF-8 ? config xi:include href=../../conf/solrconfigIncludes.xml xpointer=root xmlns:xi=http://www.w3.org/2001/XInclude; / requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfall_text/str str name=pfall_text/str str name=bfrecip(ms(NOW,dateoccurredboost),3.16e-11,1,1)/str str name=fl/str str name=mm1lt;-1 4lt;-2/str int name=ps100/int str name=q.alt*:*/str str name=hl.flhighlight/str /lst arr name=last-components strspellcheck/str /arr /requestHandler /config So, that xpointer=root, unfortunately, grabs the include id=root element, but what I need, of course, is the contents of that element (the children). I see this post: http://lucene.472066.n3.nabble.com/including-external-files-in-config-by-corename-td698324.html that implies you can use #xpointer(/*/node()) to get all elements of the root node (like if I changed my example to only have one include, and just used multiple files, which is fine if it works), however my testing gave this error: ERROR org.apache.solr.core.CoreContainer - org.xml.sax.SAXParseException: Fragment identifiers must not be used. The 'href' attribute value '../../conf/solrconfigIncludes.xml#xpointer(root/node())' is not permitted. I tried several other variations of trying to come up with pointers using node() or *, none of which worked. And I also see this post: http://lucene.472066.n3.nabble.com/solrconfig-xml-and-xinclude-td984058.html that shows off a cumbersome way to list out each child element by index number using the element scheme, which I assume works, but is way to cumbersome to use. Does any have any success using XInclude to include more than one element? I'm open to any and all approaches to having partially-common configuration between cores. Thanks, Stephen -- Stephen Duncan Jr www.stephenduncanjr.com
Re: solr/velocity: funtion for sorting asc/desc
Awesome, thanks Erik! -- View this message in context: http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167662.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ' invisible ' words
Hi Jayendra, I have changed the order and also removed the line related with synonyms... but the result is still the same... somehow some words are just invisible during my searches... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3168039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: POST for queries, length/complexity limit of fq?
I have used long fq with POST for persmission of the type fq= id:( 1...n) and they are limited by the MAX boolean Clause Exception in the solrconfig.xml. Regards Sujatha On Wed, Jul 13, 2011 at 4:40 PM, pravesh suyalprav...@yahoo.com wrote: 1. I assume that it's worthwhile to rely on POST method instead of GET when issuing a search. Right? As I can see, this should work. We do restrict users search by passing unique id's(sometimes in thousands) in 'fq' and use POST method Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html Sent from the Solr - User mailing list archive at Nabble.com.
Readercycle script issue
Hi my readercycle script is not reloading the searcher. where i may be wrong. please help Thanks Pawan