Re: ' invisible ' words

2011-07-13 Thread Jayendra Patil
Hi Denis,

The order of the filter during index time and query time are different
e.g. the synonyms filter.
Do you have a custom synonyms text file which may be causing the issues ?

It usually works fine if you have the same filter order during Index
and Query time. You can try out.

Regards,
Jayendra

On Tue, Jul 12, 2011 at 11:19 PM, deniz  wrote:
> nothing was changed... the result is still the same... shuold i implement my
> own analyzer or tokenizer for the problem?
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3164670.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-13 Thread Leo Subscriptions
Works like a charm.

Thanks,

Leo

On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote:

> you need to update the solrj libs to 3.x version. the java bin format
> has changed .
> I made the change a few months back, you can pull the changes from
> https://github.com/geek4377/nutch/tree/geek5377-1.2.1
> 
> hope that helps,
> 
> 
> On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions
>  wrote:
> > I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
> > built) and tomcat6 following this (and some other) links
> > http://wiki.apache.org/nutch/RunningNutchAndSolr
> >
> > I have added the nutch schema and can access/view this schema via the
> > admin page. nutch also works as I can perfrom successful searches.
> >
> > When I execute the following:
> >
> >>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
> > crawl/linkdb crawl/segments/*
> >
> > I (eventually) get an io error.
> >
> > Tha above command creates the following
> > files /var/lib/tomcat6/solr/core0/data/index/
> >
> > ---
> > 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
> >  0 -rw-r--r-- 1 tomcat6 tomcat6  0 2011-07-13 11:00 _1.fdx
> >  4 -rw-r--r-- 1 tomcat6 tomcat6 32 2011-07-13 10:59 segments_2
> >  4 -rw-r--r-- 1 tomcat6 tomcat6 20 2011-07-13 10:59 segments.gen
> >  0 -rw-r--r-- 1 tomcat6 tomcat6  0 2011-07-13 11:00 write.lock
> > ---
> >
> > but the hadoop.log reports the following error
> >
> > ---
> > 2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > 2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
> > dest: content
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
> > dest: site
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
> > dest: title
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
> > dest: host
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
> > dest: segment
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
> > dest: boost
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
> > dest: digest
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
> > dest: tstamp
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> > id
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> > url
> > 2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
> > java.lang.RuntimeException: Invalid version or the data in not in
> > 'javabin' format
> >at
> > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
> >at
> > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
> >at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
> >at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
> >at
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >at
> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> >at
> > org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
> >at org.apache.nutch.indexer.IndexerOutputFormat
> > $1.write(IndexerOutputFormat.java:54)
> >at org.apache.nutch.indexer.IndexerOutputFormat
> > $1.write(IndexerOutputFormat.java:44)
> >at org.apache.hadoop.mapred.ReduceTask
> > $3.collect(ReduceTask.java:440)
> >at
> > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159)
> >at
> > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
> >at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> >at org.apache.hadoop.mapred.LocalJobRunner
> > $Job.run(LocalJobRunner.java:216)
> > 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
> > Job failed!
> > ---
> >
> > I'd appreciate any help with this.
> >
> > Thanks,
> >
> > Leo
> >
> >
> >
> >




omitNorms

2011-07-13 Thread Gastone Penzo
Hi,
my field category (string) has omitNorms=True and  
omitTermFreqAndPositions=True.
 i have indexed all docs but when i do a search like:
http://xxx:xxx/solr/select/?q=category:A&debugQuery=on
i see there's normalization and idf and tf. Why? i can't understand the reason.


8.676225 = (MATCH) fieldWeight(category:A in 826), product of:
  1.0 = tf(termFreq(category:A)=1)
  8.676225 = idf(docFreq=6978, maxDocs=15049953)
  1.0 = fieldNorm(field=category, doc=826)

  
8.676225 = (MATCH) fieldWeight(category:A in 3433), product of:
  1.0 = tf(termFreq(category:A)=1)
  8.676225 = idf(docFreq=6978, maxDocs=15049953)
  1.0 = fieldNorm(field=category, doc=3433)

  
8.676225 = (MATCH) fieldWeight(category:A in 3434), product of:
  1.0 = tf(termFreq(category:A)=1)
  8.676225 = idf(docFreq=6978, maxDocs=15049953)
  1.0 = fieldNorm(field=category, doc=3434)
category field is stored and indexed. is that the problem?

Thank you

Gastone

(Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Sowmya V.B.
Hi All

I have a problem making the indexer work with the UIMA fields.

Here is what I did (With the help of this community): I compiled a
Solr-UIMA-snapshot, using "ant clean dist", by adding my own annotators
there.
It compiled without any errors. and i obtained a jar file.

Now, following the instructions on the readme (
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
)

 I modified my SolrConfig.xml and Schema.xml as suggested in the README.

As long as i say "required=false" on the UIMA generated fields, the indexing
works fine...without a UIMA annotation.

However, once I say "required=true", I get an error:

request:
http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
org.apache.solr.common.SolrException: Bad Request

Bad Request

request:
http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)
at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57)

Is thre something during indexing that I need to do apart from saying:

UpdateResponse response = server.add(docs);
(where docs is a collection of documents, without UIMA indexing.)

My understanding is that the UIMA annotation happens after calling the
server.add(docs). Is that right?

S.
-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-13 Thread Markus Jelsma
If you're using Solr anyway, you'd better upgrade to Nutch 1.3 with Solr 3.x 
support.

> Works like a charm.
> 
> Thanks,
> 
> Leo
> 
> On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote:
> > you need to update the solrj libs to 3.x version. the java bin format
> > has changed .
> > I made the change a few months back, you can pull the changes from
> > https://github.com/geek4377/nutch/tree/geek5377-1.2.1
> > 
> > hope that helps,
> > 
> > 
> > On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions
> > 
> >  wrote:
> > > I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
> > > built) and tomcat6 following this (and some other) links
> > > http://wiki.apache.org/nutch/RunningNutchAndSolr
> > > 
> > > I have added the nutch schema and can access/view this schema via the
> > > admin page. nutch also works as I can perfrom successful searches.
> > > 
> > > When I execute the following:
> > >>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
> > > 
> > > crawl/linkdb crawl/segments/*
> > > 
> > > I (eventually) get an io error.
> > > 
> > > Tha above command creates the following
> > > files /var/lib/tomcat6/solr/core0/data/index/
> > > 
> > > ---
> > > 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
> > > 
> > >  0 -rw-r--r-- 1 tomcat6 tomcat6  0 2011-07-13 11:00 _1.fdx
> > >  4 -rw-r--r-- 1 tomcat6 tomcat6 32 2011-07-13 10:59 segments_2
> > >  4 -rw-r--r-- 1 tomcat6 tomcat6 20 2011-07-13 10:59 segments.gen
> > >  0 -rw-r--r-- 1 tomcat6 tomcat6  0 2011-07-13 11:00 write.lock
> > > 
> > > ---
> > > 
> > > but the hadoop.log reports the following error
> > > 
> > > ---
> > > 2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
> > > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > > 2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
> > > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
> > > dest: content
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
> > > dest: site
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
> > > dest: title
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
> > > dest: host
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
> > > dest: segment
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
> > > dest: boost
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
> > > dest: digest
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
> > > dest: tstamp
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url
> > > dest: id
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url
> > > dest: url
> > > 2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
> > > java.lang.RuntimeException: Invalid version or the data in not in
> > > 'javabin' format
> > > 
> > >at
> > > 
> > > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99
> > > )
> > > 
> > >at
> > > 
> > > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(
> > > BinaryResponseParser.java:39)
> > > 
> > >at
> > > 
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons
> > > HttpSolrServer.java:466)
> > > 
> > >at
> > > 
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons
> > > HttpSolrServer.java:243)
> > > 
> > >at
> > > 
> > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abst
> > > ractUpdateRequest.java:105)
> > > 
> > >at
> > > 
> > > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> > > 
> > >at
> > > 
> > > org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
> > > 
> > >at org.apache.nutch.indexer.IndexerOutputFormat
> > > 
> > > $1.write(IndexerOutputFormat.java:54)
> > > 
> > >at org.apache.nutch.indexer.IndexerOutputFormat
> > > 
> > > $1.write(IndexerOutputFormat.java:44)
> > > 
> > >at org.apache.hadoop.mapred.ReduceTask
> > > 
> > > $3.collect(ReduceTask.java:440)
> > > 
> > >at
> > > 
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> > > 159)
> > > 
> > >at
> > > 
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> > > 50)
> > > 
> > >at
> > > 
> > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> > > 
> > >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > >at org.apache.hadoop.mapred.LocalJobRunner
> > > 
> > > $Job.run(LocalJobRunner.java:216)
> > > 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
> > > Job failed!
> > > ---

Geo search with spatial-solr-plugin

2011-07-13 Thread Isha Garg

Hello,

  Spatial-solr-2.0-RC5.jar works successfully with Solr-1.4.1.
   With release of solr-3.1, is the support for spatial-solr-plaugin 
going to continue or not?


Thanks!
Isha


Re: how to build lucene-solr (espeically if behind a firewall)?

2011-07-13 Thread pravesh
If behind proxy; then use:

ant dist ${build_files:autoproxy}

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-build-lucene-solr-espeically-if-behind-a-firewall-tp3163038p3165568.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: POST for queries, length/complexity limit of fq?

2011-07-13 Thread pravesh
>1. I assume that it's worthwhile to rely on POST method instead of GET
when issuing a search. Right? As I can see, this should work. 

We do restrict users search by passing unique id's(sometimes in thousands)
in 'fq' and use POST method

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I specify a different analyzer at search-time?

2011-07-13 Thread pravesh
You can configure analyzer for 'index-time' & for 'search-time' for each of
your field-types in schema.xml

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-specify-a-different-analyzer-at-search-time-tp3159463p3165593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping / Collapse Query

2011-07-13 Thread Erick Erickson
Could you just return the score with the documents, group by type and
order them any way you wanted?

Best
Erick

On Tue, Jul 12, 2011 at 9:36 PM, entdeveloper
 wrote:
> I'm messing around with the field collapsing in 4.x
> http://wiki.apache.org/solr/FieldCollapsing . Is it currently possible to
> group by a field with a certain value only and leave all the others
> ungrouped using the group.query param? This currently doesn't seem to work
> the way I want it to.
>
> For example, I have documents all with a "type" field. Possible values are:
> picture, video, game, other. I want to only group the pictures, and leave
> all other documents ungrouped.
>
> If I query something like:
> q=dogs&group=true&group.query=type:picture
>
> I ONLY get pictures back. Seems like this behaves more like an 'fq'
>
> What I want is a result set that looks like this:
>
> 1. doc 1, type=video
> 2. doc 2, type=game
> 3. doc 3, type=picture, + 3 other pictures
> 4. doc 4, type=video
> 5. doc 5, type=video
> ...
>
> I've also tried:
> q=dogs&group=true&group.query=type:picture&group.query=-type:video
> -type:game
>
> But this doesn't work because the order of the groups don't put together the
> correct order of results that would be displayed.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Grouping-Collapse-Query-tp3164433p3164433.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Can I still search documents once updated?

2011-07-13 Thread Erick Erickson
Wait, you directly contradicted yourself  You say it's
not stored, then you say it's stored and indexed, which is it?

When you fetch a document, only stored fields are returned
and the returned data is the verbatim copy of the original
data. No attempt is made to return un-stored fields. This
has been the behavior allways. If you attempted to returned
indexed but not stored data, you'd get stemmed versions,
stop words would be removed, synonyms would be in place
etc. Not to mention it would be very slow.

If the field is stored, then there's another problem, you might
want to dump the document after reading it from the IR.

Best
Erick

On Wed, Jul 13, 2011 at 2:25 AM, Gabriele Kahlout
 wrote:
> It indeed is not stored, but this is still unexpected behavior. It's a
> stored and indexed field, why has the index data been lost?
>
>
> On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson 
> wrote:
>
>> Unless you stored your "content" field, the value you put in there won't
>> be fetched from the index. Verify that the doc you retrieve from the index
>> has values for "content", I bet it doesn't
>>
>> Best
>> Erick
>>
>> On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout
>>  wrote:
>> >  @Test
>> >    public void testUpdateLoseTermsSimplified() throws Exception {
>> > *        IndexWriter writer = indexDoc();*
>> >        assertEquals(1, writer.numDocs());
>> >        IndexSearcher searcher = getSearcher(writer);
>> >        final TermQuery termQuery = new TermQuery(new Term(content,
>> > "essen"));
>> >
>> >        TopDocs docs = searcher.search(termQuery, 1);
>> >        assertEquals(1, docs.totalHits);
>> >        Document doc = searcher.doc(0);
>> >
>> > *        writer.updateDocument(new Term(id,doc.get(id)),doc);*
>> >
>> >        searcher = getSearcher(writer);
>> > *        docs = searcher.search(termQuery, 1);*
>> > *        assertEquals(1, docs.totalHits);*//docs.totalHits == 0 !
>> >    }
>> >
>> > testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest)  Time
>> elapsed:
>> > 0.346 sec  <<< FAILURE!
>> > java.lang.AssertionError: expected:<1> but was:<0>
>> >    at org.junit.Assert.fail(Assert.java:91)
>> >    at org.junit.Assert.failNotEquals(Assert.java:645)
>> >    at org.junit.Assert.assertEquals(Assert.java:126)
>> >    at org.junit.Assert.assertEquals(Assert.java:470)
>> >    at org.junit.Assert.assertEquals(Assert.java:454)
>> >    at
>> >
>> com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271)
>> >
>> > I have not changed anything (as you can see) during the update. I just
>> > retrieve a document and the update it. But then the termQuery that worked
>> > before doesn't work anymore (while the "id" field wasn't changed). Is
>> this
>> > to be expected when content field is not stored?
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> > L(-[a-z]+[0-9]X)).
>> >
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>


Re: (Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Erick Erickson
If I'm reading this right, you're labeling certain fields as required. All docs
MUST have those fields (I admit the error message could be more
informative). So it sounds like things are behaving as I'd expect, your
documents just don't contain the required fields.

Best
Erick

On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B.  wrote:
> Hi All
>
> I have a problem making the indexer work with the UIMA fields.
>
> Here is what I did (With the help of this community): I compiled a
> Solr-UIMA-snapshot, using "ant clean dist", by adding my own annotators
> there.
> It compiled without any errors. and i obtained a jar file.
>
> Now, following the instructions on the readme (
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
> )
>
>  I modified my SolrConfig.xml and Schema.xml as suggested in the README.
>
> As long as i say "required=false" on the UIMA generated fields, the indexing
> works fine...without a UIMA annotation.
>
> However, once I say "required=true", I get an error:
>
> request:
> http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
> org.apache.solr.common.SolrException: Bad Request
>
> Bad Request
>
> request:
> http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
>    at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
>    at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>    at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>    at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)
>    at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57)
>
> Is thre something during indexing that I need to do apart from saying:
>
> UpdateResponse response = server.add(docs);
> (where docs is a collection of documents, without UIMA indexing.)
>
> My understanding is that the UIMA annotation happens after calling the
> server.add(docs). Is that right?
>
> S.
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>


Re: Can I still search documents once updated?

2011-07-13 Thread Gabriele Kahlout
On Wed, Jul 13, 2011 at 1:57 PM, Erick Erickson wrote:

> Wait, you directly contradicted yourself  You say it's
> not stored, then you say it's stored and indexed, which is it?
>

ja, i meant indexed and not stored.


>
> When you fetch a document, only stored fields are returned
> and the returned data is the verbatim copy of the original
> data. No attempt is made to return un-stored fields. This
> has been the behavior allways. If you attempted to returned
> indexed but not stored data, you'd get stemmed versions,
> stop words would be removed, synonyms would be in place
> etc. Not to mention it would be very slow.
>

this is what i was expecting. Otherwise updating a field of a document that
has an unstored but indexed field is impossible (without losing the unstored
but indexed field. I call this updating a field of a document AND
deleting/updating all its unstored but indexed fields).

>
> If the field is stored, then there's another problem, you might
> want to dump the document after reading it from the IR.
>
> Best
> Erick
>
> On Wed, Jul 13, 2011 at 2:25 AM, Gabriele Kahlout
>  wrote:
> > It indeed is not stored, but this is still unexpected behavior. It's a
> > stored and indexed field, why has the index data been lost?
> >
> >
> > On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >
> >> Unless you stored your "content" field, the value you put in there won't
> >> be fetched from the index. Verify that the doc you retrieve from the
> index
> >> has values for "content", I bet it doesn't
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout
> >>  wrote:
> >> >  @Test
> >> >public void testUpdateLoseTermsSimplified() throws Exception {
> >> > *IndexWriter writer = indexDoc();*
> >> >assertEquals(1, writer.numDocs());
> >> >IndexSearcher searcher = getSearcher(writer);
> >> >final TermQuery termQuery = new TermQuery(new Term(content,
> >> > "essen"));
> >> >
> >> >TopDocs docs = searcher.search(termQuery, 1);
> >> >assertEquals(1, docs.totalHits);
> >> >Document doc = searcher.doc(0);
> >> >
> >> > *writer.updateDocument(new Term(id,doc.get(id)),doc);*
> >> >
> >> >searcher = getSearcher(writer);
> >> > *docs = searcher.search(termQuery, 1);*
> >> > *assertEquals(1, docs.totalHits);*//docs.totalHits == 0 !
> >> >}
> >> >
> >> > testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest)  Time
> >> elapsed:
> >> > 0.346 sec  <<< FAILURE!
> >> > java.lang.AssertionError: expected:<1> but was:<0>
> >> >at org.junit.Assert.fail(Assert.java:91)
> >> >at org.junit.Assert.failNotEquals(Assert.java:645)
> >> >at org.junit.Assert.assertEquals(Assert.java:126)
> >> >at org.junit.Assert.assertEquals(Assert.java:470)
> >> >at org.junit.Assert.assertEquals(Assert.java:454)
> >> >at
> >> >
> >>
> com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271)
> >> >
> >> > I have not changed anything (as you can see) during the update. I just
> >> > retrieve a document and the update it. But then the termQuery that
> worked
> >> > before doesn't work anymore (while the "id" field wasn't changed). Is
> >> this
> >> > to be expected when content field is not stored?
> >> >
> >> > --
> >> > Regards,
> >> > K. Gabriele
> >> >
> >> > --- unchanged since 20/9/10 ---
> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> > receipt within 48 hours then I don't resend the email.
> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x)
> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >
> >> > If an email is sent by a sender that is not a trusted contact or the
> >> email
> >> > does not contain a valid code then the email is not received. A valid
> >> code
> >> > starts with a hyphen and ends with "X".
> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
> ∈
> >> > L(-[a-z]+[0-9]X)).
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If

Re: (Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Sowmya V.B.
Hi Eric*

>>If I'm reading this right, you're labeling certain fields as required. All
docs MUST have those fields (I admit the error message could be more
informative). So it sounds like things are behaving as I'd expect, your
documents just don't contain the required fields.*
- But, the UIMA pipeline is supposed to add the missing fields for the
document.

Since "ant clean dist" compiled without build errors, and it was essentially
the same pipeline I already used before on a different indexer, I can say
that there is no problem with the Pipeline as such.

That again gets back my other query: While indexing, should I mention
something else, apart from just saying:

Something like:
doc1.addfield(A)
doc1.addfield(B)
docs.add(doc1)


docN.addfield(A)
docN.addfield(B)
docs.add(docN)

UpdateResponse response = server.add(docs)

- My understanding was that: the UIMAProcessor runs after I say
server.add()... inside the updateprocessor. Is it not so?

S

On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson wrote:

> If I'm reading this right, you're labeling certain fields as required. All
> docs
> MUST have those fields (I admit the error message could be more
> informative). So it sounds like things are behaving as I'd expect, your
> documents just don't contain the required fields.
>
> Best
> Erick
>
> On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B.  wrote:
> > Hi All
> >
> > I have a problem making the indexer work with the UIMA fields.
> >
> > Here is what I did (With the help of this community): I compiled a
> > Solr-UIMA-snapshot, using "ant clean dist", by adding my own annotators
> > there.
> > It compiled without any errors. and i obtained a jar file.
> >
> > Now, following the instructions on the readme (
> >
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
> > )
> >
> >  I modified my SolrConfig.xml and Schema.xml as suggested in the README.
> >
> > As long as i say "required=false" on the UIMA generated fields, the
> indexing
> > works fine...without a UIMA annotation.
> >
> > However, once I say "required=true", I get an error:
> >
> > request:
> > http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
> > org.apache.solr.common.SolrException: Bad Request
> >
> > Bad Request
> >
> > request:
> >
> http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
> >at
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
> >at
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >at
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> >at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)
> >at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57)
> >
> > Is thre something during indexing that I need to do apart from saying:
> >
> > UpdateResponse response = server.add(docs);
> > (where docs is a collection of documents, without UIMA indexing.)
> >
> > My understanding is that the UIMA annotation happens after calling the
> > server.add(docs). Is that right?
> >
> > S.
> > --
> > Sowmya V.B.
> > 
> > Losing optimism is blasphemy!
> > http://vbsowmya.wordpress.com
> > 
> >
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: (Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Erick Erickson
I'll have to punt here. I don't know the internals well enough to say. I suppose
it's possible that the "required fields" check happens *before* the UIMA
stuff happens, but since I know so little about UIMA that's a blind guess
at best...

Anyone with real knowledge want to chime in here?

Erick

On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B.  wrote:
> Hi Eric*
>
>>>If I'm reading this right, you're labeling certain fields as required. All
> docs MUST have those fields (I admit the error message could be more
> informative). So it sounds like things are behaving as I'd expect, your
> documents just don't contain the required fields.*
> - But, the UIMA pipeline is supposed to add the missing fields for the
> document.
>
> Since "ant clean dist" compiled without build errors, and it was essentially
> the same pipeline I already used before on a different indexer, I can say
> that there is no problem with the Pipeline as such.
>
> That again gets back my other query: While indexing, should I mention
> something else, apart from just saying:
>
> Something like:
> doc1.addfield(A)
> doc1.addfield(B)
> docs.add(doc1)
> 
>
> docN.addfield(A)
> docN.addfield(B)
> docs.add(docN)
>
> UpdateResponse response = server.add(docs)
>
> - My understanding was that: the UIMAProcessor runs after I say
> server.add()... inside the updateprocessor. Is it not so?
>
> S
>
> On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson 
> wrote:
>
>> If I'm reading this right, you're labeling certain fields as required. All
>> docs
>> MUST have those fields (I admit the error message could be more
>> informative). So it sounds like things are behaving as I'd expect, your
>> documents just don't contain the required fields.
>>
>> Best
>> Erick
>>
>> On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B.  wrote:
>> > Hi All
>> >
>> > I have a problem making the indexer work with the UIMA fields.
>> >
>> > Here is what I did (With the help of this community): I compiled a
>> > Solr-UIMA-snapshot, using "ant clean dist", by adding my own annotators
>> > there.
>> > It compiled without any errors. and i obtained a jar file.
>> >
>> > Now, following the instructions on the readme (
>> >
>> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
>> > )
>> >
>> >  I modified my SolrConfig.xml and Schema.xml as suggested in the README.
>> >
>> > As long as i say "required=false" on the UIMA generated fields, the
>> indexing
>> > works fine...without a UIMA annotation.
>> >
>> > However, once I say "required=true", I get an error:
>> >
>> > request:
>> > http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
>> > org.apache.solr.common.SolrException: Bad Request
>> >
>> > Bad Request
>> >
>> > request:
>> >
>> http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
>> >    at
>> >
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
>> >    at
>> >
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>> >    at
>> >
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>> >    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>> >    at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)
>> >    at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57)
>> >
>> > Is thre something during indexing that I need to do apart from saying:
>> >
>> > UpdateResponse response = server.add(docs);
>> > (where docs is a collection of documents, without UIMA indexing.)
>> >
>> > My understanding is that the UIMA annotation happens after calling the
>> > server.add(docs). Is that right?
>> >
>> > S.
>> > --
>> > Sowmya V.B.
>> > 
>> > Losing optimism is blasphemy!
>> > http://vbsowmya.wordpress.com
>> > 
>> >
>>
>
>
>
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>


Solr versioning policy

2011-07-13 Thread Mike Squire
Hi,

I've noticed that since the 3.1 release new minor version releases have been
happening about every two months. I have a couple of questions:

1. Is this the plan moving forward (to aim for a new minor release
approximately every couple of months)?
2. Will minor version increases always be backwards compatible (so I could
upgrade from 3.x to 3.y where y > x without having to update the
schema/config or rebuild the indexes)?

It might be worth sticking something up on the wiki which gives an overview
of the versioning policy just to clarify things. (I had a look and couldn't
find anything.)

Cheers,
Mike.


Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 14:05, schrieb Gabriele Kahlout:
> this is what i was expecting. Otherwise updating a field of a document that
> has an unstored but indexed field is impossible (without losing the unstored
> but indexed field. I call this updating a field of a document AND
> deleting/updating all its unstored but indexed fields).

Not necessarily. The usual use case is that you have some kind of
existing data source from where you fill your Solr index. When you want
to update field of a document, then you simply re-index from that
source. There's no need to fetch data from Solr before.

Otherwise, if you really don't have such an existing data source because
a horde of typewriting monkeys filled your Solr index, then you should
better declare all your fields as stored. Otherwise you'll never have a
chance to get that data back.

Greeting,
Kuli


Re: Can I still search documents once updated?

2011-07-13 Thread Gabriele Kahlout
Well, I'm !sure how usual this scenario would be:
1. In general those using solr with nutch don't store the content field to
avoid storing the whole web/intranet in their index, twice (1 in the form of
stored data, and one in the form of indexed data).

Now everytime they need to update a field unrelated to content (number of
inbound links for an example) they would have to re-crawl the page again.
This is at least !intuitive.


On Wed, Jul 13, 2011 at 2:40 PM, Michael Kuhlmann  wrote:

> Am 13.07.2011 14:05, schrieb Gabriele Kahlout:
> > this is what i was expecting. Otherwise updating a field of a document
> that
> > has an unstored but indexed field is impossible (without losing the
> unstored
> > but indexed field. I call this updating a field of a document AND
> > deleting/updating all its unstored but indexed fields).
>
> Not necessarily. The usual use case is that you have some kind of
> existing data source from where you fill your Solr index. When you want
> to update field of a document, then you simply re-index from that
> source. There's no need to fetch data from Solr before.
>
> Otherwise, if you really don't have such an existing data source because
> a horde of typewriting monkeys filled your Solr index, then you should
> better declare all your fields as stored. Otherwise you'll never have a
> chance to get that data back.
>
> Greeting,
> Kuli
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 15:37, schrieb Gabriele Kahlout:
> Well, I'm !sure how usual this scenario would be:
> 1. In general those using solr with nutch don't store the content field to
> avoid storing the whole web/intranet in their index, twice (1 in the form of
> stored data, and one in the form of indexed data).
> 

Not exactly. The indexed form is quite different from the stored form;
only the tokens are stored, each token only once, and some additional
data like the document count and, maybe, shingle information etc..

Hence, indexed data usually needs much less space on disk than the
original data.

There's no practical alternative to storing the content in a stored
field. What would you otherwise display as a search result? "The
following web pages have your search term somewhere in their contents,
don't know where, take a look on your own"?

Greetings,
Kuli


Re: Can I still search documents once updated?

2011-07-13 Thread Gabriele Kahlout
On Wed, Jul 13, 2011 at 3:54 PM, Michael Kuhlmann  wrote:

> Am 13.07.2011 15:37, schrieb Gabriele Kahlout:
> > Well, I'm !sure how usual this scenario would be:
> > 1. In general those using solr with nutch don't store the content field
> to
> > avoid storing the whole web/intranet in their index, twice (1 in the form
> of
> > stored data, and one in the form of indexed data).
> >
>
> Not exactly. The indexed form is quite different from the stored form;
> only the tokens are stored, each token only once, and some additional
> data like the document count and, maybe, shingle information etc..
>
> Hence, indexed data usually needs much less space on disk than the
> original data.
>

I realized that. Maybe I should have said "1.X (1 in the form of stored data
and 0.X in the form of indexed data).

>
> There's no practical alternative to storing the content in a stored
> field. What would you otherwise display as a search result? "The
> following web pages have your search term somewhere in their contents,
> don't know where, take a look on your own"?
>
> Display the title, and url (and implicitly say "The
following web pages have your search term somewhere in their contents, don't
REMEMBER where, take a look on your own"?).

Solr is already configured by default not to store more than a
 anyway. Usually one stores content only to display
snippets.



> Greetings,
> Kuli
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


How to add TrieIntField to a SolrInputDocument?

2011-07-13 Thread Gabriele Kahlout
SolrInputDocument doc = new SolrInputDocument();
doc.setField(id, "0");
doc.setField("url", getURL("0"));
doc.setField(content, "blah blah blah");
*doc.setField(wc, 150); //wc is of solr.TrieIntField field type in
schema.xml*
assertU(adoc(doc));
assertU(commit());
assertNumFound(1);

The above test fails until I change the following in schema.xml:
 - 
 + 


On Sun, Jul 10, 2011 at 10:36 PM, Gabriele Kahlout  wrote:

>
> This was my problem:
> 
>
> I had taken my queu from Nutch's schema:
> 
>
>
>
> On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley 
> wrote:
>
>> Something is wrong with your indexing.
>> Is "wc" an indexed field?  If not, change it so it is, then re-index your
>> data.
>>
>> If so, I'd recommend starting with the example data and filter for
>> something like popularity:[6 TO 10] to convince yourself it works,
>> then figuring out what you did differently in your schema/data.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout
>>  wrote:
>> > http://localhost:8080/solr/select?indent=on&version=2.2&q=*%3A**
>> > &fq=wc%3A%5B255+TO+257%5D*
>> > &start=0&rows=10&fl=*%2Cscore&qt=&wt=xml&explainOther=&hl.fl=
>> >
>> > The toString of the request:
>> >
>> {explainOther=&fl=*,score&indent=on&start=0&q=*:*&hl.fl=&qt=&wt=xml&fq=wc:[255+TO+257]&rows=1&version=2.2}
>> >
>> > Even when the FilterQuery is constructed in Java it doesn't work (i get
>> > results that ignore the filter query completely).
>> >
>> >
>> > On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan  wrote:
>> >
>> >> > I don't get it to work!
>> >> >
>> >> > If I specify no fq I get the first result with > >> > name="wc">256
>> >> >
>> >> > With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing
>> >> > comes out.
>> >>
>> >> If you give us the Full URL you are using, it can be helpful.
>> >>
>> >> Correct syntax is &fq=wc:[255 TO 257]
>> >>
>> >> You can use more that fq in a request.
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> > L(-[a-z]+[0-9]X)).
>> >
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 16:09, schrieb Gabriele Kahlout:
> Solr is already configured by default not to store more than a
>  anyway. Usually one stores content only to display
> snippets.

Yes, but the snippets must come from somewhere.

For instance, if you're using Solr's highlighting feature, all
highlighted fields must be stored.

See http://www.intellog.com/blog/?p=208 for explanation from someone
else. ;)

Greetings,
Kuli


about standard analyzer

2011-07-13 Thread Kiwi de coder
hi,

I using solr 3.3 which in schema.xml contain this :


  


i use the sentences as example "XY&Z Corporation - x...@example.com"

however, when I try on /analysis.jsp, it show difference result compare to
using Lucene.

using solr I got result below when using "text_standard" and "text_general"
(is both the same ?)

XYZCorporationxyzexample.com


Using Solr for searching in a social network

2011-07-13 Thread Kien Nguyen
Hi everyone,

I'm building a social network site and I need to build a search module,
which is someway similar to Facebook Search.

Say this module can search people by their names, based on the following
priority levels:

- My friends: has the 1st priority (highest)
- Friends of my friends, or anyone who is someway related to me: has the 2nd
priority
- Everyone else on the network: has the 3rd priority (lowest)

The number of user will grow very big, so I cannot flatten data of users
along with their friends and importing all of them into Solr.

Can Solr help me to solve this problem ?

If I have all necessary services to get friend list of a user (the 1st
priority), or get list of friends of my friends (the 2nd priority), can Solr
use these external data sources for searching?

Please help me.

Thanks and regards,
Nguyen Trung Kien


about standardAnaylzer in solr

2011-07-13 Thread Kiwi de coder
hi,

I using solr 3.3 which in schema.xml contain this :


  


i use the sentences as example "XY&Z Corporation - x...@example.com"

however, when I try on /analysis.jsp, it show difference result compare to
using Lucene.

using solr I got result below when using "text_standard" and "text_general"
(is both the same ?)

XYZCorporationxyzexample.com (which all belong to 
)

when using Lucene, i got this

  StandardAnalyzer:

1: [xy&z:0->4:]
2: [corporation:5->16:]
3: [x...@example.com:19->34:]


so my question is, how to make it analysis like in Lucene ?

regards,
kiwi


Can we use crawled data by Nutch 0.9 in other versions of Nutch

2011-07-13 Thread serenity keningston
Hello,

I have a question and I apologize if it sounds stupid. I just want to know,
if we can use the crawled data by Nutch 0.9 in Nutch 1.3 because search has
been delegated to Solr in Nutch 1.3 and I want to get the search results
from the crawled data by Nutch 0.9 in Nutch 1.3

Serenity


Re: (Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Tommaso Teofili
Hello,

I think the problem might be the following, if you defined the update
request handlers like in the sample solrconfig :


 
  
 




  uima

  
...
  

then the uima update chain will be executed only for HTTP POSTs on /update
and not for /update/javabin (that is used by SolrJ), so you may need to
update the /update/javabin configuration as follows:


  
  uima



Hope this helps,
Tommaso


2011/7/13 Erick Erickson 

> I'll have to punt here. I don't know the internals well enough to say. I
> suppose
> it's possible that the "required fields" check happens *before* the UIMA
> stuff happens, but since I know so little about UIMA that's a blind guess
> at best...
>
> Anyone with real knowledge want to chime in here?
>
> Erick
>
> On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B.  wrote:
> > Hi Eric*
> >
> >>>If I'm reading this right, you're labeling certain fields as required.
> All
> > docs MUST have those fields (I admit the error message could be more
> > informative). So it sounds like things are behaving as I'd expect, your
> > documents just don't contain the required fields.*
> > - But, the UIMA pipeline is supposed to add the missing fields for the
> > document.
> >
> > Since "ant clean dist" compiled without build errors, and it was
> essentially
> > the same pipeline I already used before on a different indexer, I can say
> > that there is no problem with the Pipeline as such.
> >
> > That again gets back my other query: While indexing, should I mention
> > something else, apart from just saying:
> >
> > Something like:
> > doc1.addfield(A)
> > doc1.addfield(B)
> > docs.add(doc1)
> > 
> >
> > docN.addfield(A)
> > docN.addfield(B)
> > docs.add(docN)
> >
> > UpdateResponse response = server.add(docs)
> >
> > - My understanding was that: the UIMAProcessor runs after I say
> > server.add()... inside the updateprocessor. Is it not so?
> >
> > S
> >
> > On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson  >wrote:
> >
> >> If I'm reading this right, you're labeling certain fields as required.
> All
> >> docs
> >> MUST have those fields (I admit the error message could be more
> >> informative). So it sounds like things are behaving as I'd expect, your
> >> documents just don't contain the required fields.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. 
> wrote:
> >> > Hi All
> >> >
> >> > I have a problem making the indexer work with the UIMA fields.
> >> >
> >> > Here is what I did (With the help of this community): I compiled a
> >> > Solr-UIMA-snapshot, using "ant clean dist", by adding my own
> annotators
> >> > there.
> >> > It compiled without any errors. and i obtained a jar file.
> >> >
> >> > Now, following the instructions on the readme (
> >> >
> >>
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
> >> > )
> >> >
> >> >  I modified my SolrConfig.xml and Schema.xml as suggested in the
> README.
> >> >
> >> > As long as i say "required=false" on the UIMA generated fields, the
> >> indexing
> >> > works fine...without a UIMA annotation.
> >> >
> >> > However, once I say "required=true", I get an error:
> >> >
> >> > request:
> >> >
> http://anafi:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
> >> > org.apache.solr.common.SolrException: Bad Request
> >> >
> >> > Bad Request
> >> >
> >> > request:
> >> >
> >>
> http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
> >> >at
> >> >
> >>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
> >> >at
> >> >
> >>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >> >at
> >> >
> >>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >> >at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> >> >at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)
> >> >at indexerapp.ir4llSolrIndexer.main(ir4llSolrIndexer.java:57)
> >> >
> >> > Is thre something during indexing that I need to do apart from saying:
> >> >
> >> > UpdateResponse response = server.add(docs);
> >> > (where docs is a collection of documents, without UIMA indexing.)
> >> >
> >> > My understanding is that the UIMA annotation happens after calling the
> >> > server.add(docs). Is that right?
> >> >
> >> > S.
> >> > --
> >> > Sowmya V.B.
> >> > 
> >> > Losing optimism is blasphemy!
> >> > http://vbsowmya.wordpress.com
> >> > 
> >> >
> >>
> >
> >
> >
> > --
> > Sowmya V.B.
> > 
> > Losing optimism is blasphemy!
> > http://vbsowmya.wordpress.com
> > 
> >
>


Re: about standard analyzer

2011-07-13 Thread Erick Erickson
You're probably seeing the effects of a number of the filters that
Solr is applying (see the fieldType definition). In particular, this
looks like the result of WordDelimiterFilterFactory. If you click
the "verbose" box on the analysis page, you should see the
results of each step in the analysis chain.

Best
Erick

On Wed, Jul 13, 2011 at 10:36 AM, Kiwi de coder  wrote:
> hi,
>
> I using solr 3.3 which in schema.xml contain this :
>
>    
>       class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>    
>
> i use the sentences as example "XY&Z Corporation - x...@example.com"
>
> however, when I try on /analysis.jsp, it show difference result compare to
> using Lucene.
>
> using solr I got result below when using "text_standard" and "text_general"
> (is both the same ?)
>
> XYZCorporationxyzexample.com
>


Re: Can we use crawled data by Nutch 0.9 in other versions of Nutch

2011-07-13 Thread Markus Jelsma
You're on the wrong side of the fence. Anyway, you need to get Nutch 1.1 first 
as it has a CrawlDB converter. Convert your 0.9 CrawlDB first with Nutch 1.1, 
then upgrade to 1.3.

On Wednesday 13 July 2011 16:48:24 serenity keningston wrote:
> Hello,
> 
> I have a question and I apologize if it sounds stupid. I just want to know,
> if we can use the crawled data by Nutch 0.9 in Nutch 1.3 because search has
> been delegated to Solr in Nutch 1.3 and I want to get the search results
> from the crawled data by Nutch 0.9 in Nutch 1.3
> 
> Serenity

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: about standard analyzer

2011-07-13 Thread Kiwi de coder
hi, sorry.. i accidentally sending out this uncomplete mail.. actually I had
send another one, please ignore this thx :)

kiwi

On Wed, Jul 13, 2011 at 10:52 PM, Erick Erickson wrote:

> You're probably seeing the effects of a number of the filters that
> Solr is applying (see the fieldType definition). In particular, this
> looks like the result of WordDelimiterFilterFactory. If you click
> the "verbose" box on the analysis page, you should see the
> results of each step in the analysis chain.
>
> Best
> Erick
>
> On Wed, Jul 13, 2011 at 10:36 AM, Kiwi de coder 
> wrote:
> > hi,
> >
> > I using solr 3.3 which in schema.xml contain this :
> >
> >
> >   > class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
> >
> >
> > i use the sentences as example "XY&Z Corporation - x...@example.com"
> >
> > however, when I try on /analysis.jsp, it show difference result compare
> to
> > using Lucene.
> >
> > using solr I got result below when using "text_standard" and
> "text_general"
> > (is both the same ?)
> >
> > XYZCorporationxyzexample.com
> >
>


Re: Index Solr Logs

2011-07-13 Thread O. Klein
Im also interested in this.

Noone has ever tried to index solr log with DIH?

Or is there some way to store solr logs in mysql db?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Solr-Logs-tp3109956p3166302.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: about standardAnaylzer in solr

2011-07-13 Thread Edoardo Tosca
Try to change from StandardTolkenizerFactory to ClassicTokenizerFactory or
create your own fieldType


  
**
...



Edo

On Wed, Jul 13, 2011 at 3:40 PM, Kiwi de coder  wrote:

> hi,
>
> I using solr 3.3 which in schema.xml contain this :
>
>
>  
>
>
> i use the sentences as example "XY&Z Corporation - x...@example.com"
>
> however, when I try on /analysis.jsp, it show difference result compare to
> using Lucene.
>
> using solr I got result below when using "text_standard" and "text_general"
> (is both the same ?)
>
> XYZCorporationxyzexample.com (which all belong to
> 
> )
>
> when using Lucene, i got this
>
>  StandardAnalyzer:
>
> 1: [xy&z:0->4:]
> 2: [corporation:5->16:]
> 3: [x...@example.com:19->34:]
>
>
> so my question is, how to make it analysis like in Lucene ?
>
> regards,
> kiwi
>



-- 
Edoardo Tosca
Sourcesense - making sense of Open Source: http://www.sourcesense.com


Re: about standardAnaylzer in solr

2011-07-13 Thread Kiwi de coder
ok, work now ! thx :)

kiwi

On Wed, Jul 13, 2011 at 11:06 PM, Edoardo Tosca wrote:

> Try to change from StandardTolkenizerFactory to ClassicTokenizerFactory or
> create your own fieldType
>
>  positionIncrementGap="100">
>  
>**
>...
>
>
>
> Edo
>
> On Wed, Jul 13, 2011 at 3:40 PM, Kiwi de coder  wrote:
>
> > hi,
> >
> > I using solr 3.3 which in schema.xml contain this :
> >
> >
> >  
> >
> >
> > i use the sentences as example "XY&Z Corporation - x...@example.com"
> >
> > however, when I try on /analysis.jsp, it show difference result compare
> to
> > using Lucene.
> >
> > using solr I got result below when using "text_standard" and
> "text_general"
> > (is both the same ?)
> >
> > XYZCorporationxyzexample.com (which all belong to
> > 
> > )
> >
> > when using Lucene, i got this
> >
> >  StandardAnalyzer:
> >
> > 1: [xy&z:0->4:]
> > 2: [corporation:5->16:]
> > 3: [x...@example.com:19->34:]
> >
> >
> > so my question is, how to make it analysis like in Lucene ?
> >
> > regards,
> > kiwi
> >
>
>
>
> --
> Edoardo Tosca
> Sourcesense - making sense of Open Source: http://www.sourcesense.com
>


Re: (Solr-UIMA) Indexing problems with UIMA fields.

2011-07-13 Thread Sowmya V.B.
Hello Tomasso

Thanks for the reply.

I did added uima chain to /javabin handler as you suggested. Now, I get an
internal server error!

Here is the stacktrace.

request:
http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request:
http://localhost:8080/apache-solr-3.3.0/update/javabin?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at indexerapp.ir4llSolrIndexer.indexAll(ir4llSolrIndexer.java:150)


Now, I began tracing back from the instructional README.txt file. A few
doubts:

1) copy generated solr-uima jar and its libs (under contrib/uima/lib) inside
a Solr libraries directory. or set  tags in solrconfig.xml
appropriately to point those jar files. 

-Which Solr Libraries directory does this refer to?? Does it refer to the
Lib director inside the WEB-INF folder of the Solr webapp?

2)


  false
  
text
  


-the only field I need to send through the pipeline is "text" field. Is it
enough if I specify that inside SolrConfig in this point...or should I do
something more?

3) Where can I see a more detailed Log about what is happening inside Solr?
I am running Solr from Eclipse + Tomcat. Neither the Console nor the Eclipse
Tomcat log show me a detailed errorlog.

S
On Wed, Jul 13, 2011 at 4:48 PM, Tommaso Teofili
wrote:

> Hello,
>
> I think the problem might be the following, if you defined the update
> request handlers like in the sample solrconfig :
>
> 
>  class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>  
> 
> 
>
> 
>
>  uima
>
>  
> ...
>   class="solr.BinaryUpdateRequestHandler" />
>
> then the uima update chain will be executed only for HTTP POSTs on /update
> and not for /update/javabin (that is used by SolrJ), so you may need to
> update the /update/javabin configuration as follows:
>
>  class="solr.BinaryUpdateRequestHandler" >
>  
>  uima
>
> 
>
> Hope this helps,
> Tommaso
>
>
> 2011/7/13 Erick Erickson 
>
> > I'll have to punt here. I don't know the internals well enough to say. I
> > suppose
> > it's possible that the "required fields" check happens *before* the UIMA
> > stuff happens, but since I know so little about UIMA that's a blind guess
> > at best...
> >
> > Anyone with real knowledge want to chime in here?
> >
> > Erick
> >
> > On Wed, Jul 13, 2011 at 8:08 AM, Sowmya V.B.  wrote:
> > > Hi Eric*
> > >
> > >>>If I'm reading this right, you're labeling certain fields as required.
> > All
> > > docs MUST have those fields (I admit the error message could be more
> > > informative). So it sounds like things are behaving as I'd expect, your
> > > documents just don't contain the required fields.*
> > > - But, the UIMA pipeline is supposed to add the missing fields for the
> > > document.
> > >
> > > Since "ant clean dist" compiled without build errors, and it was
> > essentially
> > > the same pipeline I already used before on a different indexer, I can
> say
> > > that there is no problem with the Pipeline as such.
> > >
> > > That again gets back my other query: While indexing, should I mention
> > > something else, apart from just saying:
> > >
> > > Something like:
> > > doc1.addfield(A)
> > > doc1.addfield(B)
> > > docs.add(doc1)
> > > 
> > >
> > > docN.addfield(A)
> > > docN.addfield(B)
> > > docs.add(docN)
> > >
> > > UpdateResponse response = server.add(docs)
> > >
> > > - My understanding was that: the UIMAProcessor runs after I say
> > > server.add()... inside the updateprocessor. Is it not so?
> > >
> > > S
> > >
> > > On Wed, Jul 13, 2011 at 2:00 PM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> > >
> > >> If I'm reading this right, you're labeling certain fields as required.
> > All
> > >> docs
> > >> MUST have those fields (I admit the error message could be more
> > >> informative). So it sounds like things are behaving as I'd expect,
> your
> > >> documents just don't contain the required fields.
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> On Wed, Jul 13, 2011 at 4:54 AM, Sowmya V.B. 
> > wrote:
> > >> > Hi All
> > >> >
> > >> > I have a problem making the indexer work with the UIMA fields.
> > >> >
> > >> > Here is what I did (With the help of this community): I compiled a
> > >> > Solr-UIMA-snapshot, using "ant clean dist", by adding my own
> > annotators
> > >> > there.
> > >> > It compiled without any errors. and i obtained a jar file.
> > >> >
> > >> > Now, following the instructions on the readme (
> > >> >
> > >>
> >
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/s

Re: Index Solr Logs

2011-07-13 Thread Gora Mohanty
On Wed, Jul 13, 2011 at 8:31 PM, O. Klein  wrote:
> Im also interested in this.
>
> Noone has ever tried to index solr log with DIH?
[...]

Just upstream in this thread, Mike pointed out  Logg.ly:
 http://www.loggly.com/

Regards,
Gora


Multivalued field: get only the str that matches with the query

2011-07-13 Thread Lucas Miguez
Hi all,

I have a multivalued field. I need to make a search in the multivalued
field and get only the values that match with the query.

Example:


aaa bbb ccc
aaa
ccc


So, if I make a search like "text:aaa" the 3 values are returned, when
only results one and two are the correct.

I am using the WhitespaceTokenizer in the index and in the query analyzers:

   
   

   
   


   
  


How to do that on Apache Solr?

Thanks!


Preserve XML hierarchy

2011-07-13 Thread Lucas Miguez
Hi,

is it possible to do that in Apache Solr? If i make a search, how I
know from where it comes the result?

Thanks!

I have an XML like this:


some Text
another text

text
more text


text





some Text

text



some Text
another text

text
more text


text


text


text








Wildcard

2011-07-13 Thread GAURAV PAREEK
Hello,

What are wildcards we can use with the SOLR ?

Regards,
Gaurav


Re: Wildcard

2011-07-13 Thread François Schiettecatte
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

http://wiki.apache.org/solr/SolrQuerySyntax

François

On Jul 13, 2011, at 1:29 PM, GAURAV PAREEK wrote:

> Hello,
> 
> What are wildcards we can use with the SOLR ?
> 
> Regards,
> Gaurav



Re: Grouping / Collapse Query

2011-07-13 Thread entdeveloper
I guess that's a possible solution, but the two concerns I would have are 1)
putting the burden of sorting on the client instead of solr, where it
belongs. And 2) needing to request more results than I'd want to display in
order to guarantee I could populate the entire page of results to compensate
for the grouping.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-Collapse-Query-tp3164433p3166789.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Preserve XML hierarchy

2011-07-13 Thread Gora Mohanty
On Wed, Jul 13, 2011 at 10:30 PM, Lucas Miguez  wrote:
> Hi,
>
> is it possible to do that in Apache Solr? If i make a search, how I
> know from where it comes the result?
[...]

Your question is not very clear, and I happen unfortunately to be
out of crystal balls and Tarot cards.

Is it possible to do what? Make a search on what, and what sort
of results do you you expect from said search?

Peering into the misty depths of my non-existent crystal ball,
if you are asking is it possible to index an XML file, search it,
and figure out which node of the XML the search result comes
from, yes that is possible; though details, and better advice
would require more input from your side. Roughly speaking,
each node can go into a separate Solr field, and full-text
search on all relevant fields is also possible. Joking aside, please
do provide more details.

Regards,
Gora


Re: ContentStreamLoader Problem

2011-07-13 Thread Tod

On 07/12/2011 6:52 PM, Erick Erickson wrote:

This is a shot in the dark, but this smells like a classpath issue,
and since you have
a 1.4.1 installation on the machine, I'm *guessing* that you're getting a mix of
old and new Jars. What happens if you try this on a machine that doesn't have
1.4.1 on it? If that works, then it's likely a classpath issue

Best
Erick


I'll give it a shot and report back.


Thanks - Tod


SolrCloud Shardding

2011-07-13 Thread Jamie Johnson
Reading the SolrCloud wiki I see that there are goals to support
different shardding algorithms, what is currently implemented today?
Is the shardding logic the responsibility of the application doing the
index?


omitTermFreq only?

2011-07-13 Thread Jibo John
Hello,

I was wondering if there is a way we can omit only the Term Frequency in solr? 

omitTermFreqAndPositions =true wouldn't work for us since we need the positions 
for supporting phrase queries.

Thanks,
-Jibo


omitTermFreq only?

2011-07-13 Thread Jibo John
Hello,

I was wondering if there is a way we can omit only the Term Frequency in solr? 

omitTermFreqAndPositions =true wouldn't work for us since we need the positions 
for supporting phrase queries.

Thanks,
-Jibo



extending edismax?

2011-07-13 Thread solr nps
I am using Solr 3.3. I am using the edismax query parser and I am getting
great results. To improve relevancy I want to add some semantic filters to
the query.

E.g. I want to pass the query "red shoes" as q="shoes"&fq=color:red. I have
a service that can tell me that in the phrase "red shoes" the word red is
the color.

My question is where should I invoke this external service,

1) should my search client call the service, form the request and then call
Solr or
2) should I pass the query as is to Solr and have Solr call the service
internally.


1 is easier for me as I am familiar with the client code, 2 would be harder.
I wanted to know what the best practices are.

I am happy with edismax so I want to reuse all its functionality, so can I
write a custom handler that calls my service and then hands the request over
to edismax?

Thanks for your time.


deletedPkQuery fails

2011-07-13 Thread Elaine Li
Hi Folks,

I am trying to use the deletedPkQuery to enable deltaImport to remove
the inactive products from solr.
I am keeping getting the syntax error saying the query syntax is not
right. I have tried many alternatives to the following query. Although
all of them work in the mysql prompt directly, no one works in solr
handler. Can anyone give me some hint to debug this type of problem?
Is there anything special about deletedPkQuery I am not aware of?

deletedPkQuery="select p.pId as id from products p join products_large
pl on p.pId=pl.pId where p.pId= ${dataimporter.delta.id} and
pl.deleted='' having count(*)=0"

Jul 13, 2011 4:02:23 PM
org.apache.solr.handler.dataimport.DataImporter doDeltaImport
SEVERE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: select p.pId as id from products p join
products_large pl on p.pId=pl.pI
d where p.pId=  and pl.deleted='' having count(*)=0 Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextDeletedRowKey(SqlEntityProcessor.java:91)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextDeletedRowKey(EntityProcessorWrapper.java:258)
at 
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:636)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:258)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL serv
er version for the right syntax to use near 'and pl.deleted='' having
count(*)=0' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:407)
at com.mysql.jdbc.Util.getInstance(Util.java:382)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3603)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3535)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1989)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2150)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2570)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:779)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:622)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246)

Elaine


Re: solr/velocity: funtion for sorting asc/desc

2011-07-13 Thread okayndc
Thanks Eric.

So if I had a link "Sort Title" and the default is &sort=title desc how can
i switch that to &sort=title asc?

example:  http://# Sort Tile  (default &sort=title desc)  user clicks on
link and sort should toggle(or switch) to &sort=title asc

how can this be achieved?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167267.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: omitTermFreq only?

2011-07-13 Thread Markus Jelsma
A dirty hack is to return 1.0f for each tf > 0. Just a couple of lines code 
for a custom similarity class.

> Hello,
> 
> I was wondering if there is a way we can omit only the Term Frequency in
> solr?
> 
> omitTermFreqAndPositions =true wouldn't work for us since we need the
> positions for supporting phrase queries.
> 
> Thanks,
> -Jibo


Re: solr/velocity: funtion for sorting asc/desc

2011-07-13 Thread Erik Hatcher
You'll have to add some logic in your Velocity templates string process the 
sort parameter and determine whether to set the link to be "asc"ending or 
"desc"ending.  It'll require learning some Velocity techniques to do this with 
#if and how to navigate the objects Solr puts into the Velocity context.  
You'll find pointers to more information here: 
http://wiki.apache.org/solr/VelocityResponseWriter

I know this doesn't quite get you there without you doing the homework, but 
maybe there'll be enough pointers in that wiki page to get you there.

I've done this in various one-off cases, but nothing handy to share at the 
moment (I'm traveling), but maybe in a week or so I'll be able to dig something 
out or re-craft the voodoo.

Erik


On Jul 13, 2011, at 14:23 , okayndc wrote:

> Thanks Eric.
> 
> So if I had a link "Sort Title" and the default is &sort=title desc how can
> i switch that to &sort=title asc?
> 
> example:  http://# Sort Tile  (default &sort=title desc)  user clicks on
> link and sort should toggle(or switch) to &sort=title asc
> 
> how can this be achieved?
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167267.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Test document generator (in PHP)

2011-07-13 Thread Marian Steinbach
Hi!

I needed to fill an index with content and created myself a test
document generator. It generates random text documents based on
english terms. The content doesn't make sense, but allows for testing
of some search features. Each document is unique. Document length is
configured to vary within a range.

Right now it's pretty slow (100 documents take 4 minutes on my
machine), but for me it works. If you want to use it yourself, get it
here:

  https://github.com/marians/php-solr-testdoc-generator

Cheers,

Marian


Re: omitTermFreq only?

2011-07-13 Thread Jibo John
Sorry I should have made the objectives clear. The goal is to reduce the index 
size by avoiding TermFrequency stored in the index (in .frq segment files). 

After exploring a bit more, realized that LUCENE-2048 now allows omitPositions. 
Similarly, I'm looking for a omitFrequency option.

Thanks,
-Jibo


On Jul 13, 2011, at 1:34 PM, Markus Jelsma wrote:

> A dirty hack is to return 1.0f for each tf > 0. Just a couple of lines code 
> for a custom similarity class.
> 
>> Hello,
>> 
>> I was wondering if there is a way we can omit only the Term Frequency in
>> solr?
>> 
>> omitTermFreqAndPositions =true wouldn't work for us since we need the
>> positions for supporting phrase queries.
>> 
>> Thanks,
>> -Jibo



Upgrading solr from 1.4 to latest version

2011-07-13 Thread rvidela
Hi,

I am new to Solr. In little time, I am very much impressed with its search
performance. I have installed Solr on Ubuntu using "*apt-get install
solr-tomcat curl -y*" command. From admin page, I can see that solr version
is 1.4.1. But i see there is 3.x version already available. Just wondering
if there is any easy way to upgrade it to latest version. 

Tried specifying version number in apt-get, But it does not work. Appreciate
your help.

Thanks
Ravi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrading-solr-from-1-4-to-latest-version-tp3164312p3164312.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Rewrite

2011-07-13 Thread Jamie Johnson
Ok, so I think I have something working relatively well.  I have a few
issues which I'm not sure how to address though.  Currently when I
create my ParserPlugin in solrconfig I do the following



  person_name^1.0 person_name_first^0.5 person_name_last^0.5
   


then in the CustomQueryParser I iterate over all the arguments adding
each key/value to a Map.  I then pass in this to the constructor of a
basically copied ExtendedDismaxQParser (only difference is the added
aliases and the logic to add those to the ExtendedSolrQParser).

Now, the thing I hate about this is I had to pull pieces into my own
class since some of the methods being called (in QueryUtils for
instance) are not publicly available.

Also I didn't expose parameters to allow this to be done per query
level, but I'd most certainly take suggestions on how to do this.  My
use case doesn't require it but I'd be happy to make the modifications
and try it out.

On Tue, Jul 12, 2011 at 5:08 PM, Jamie Johnson  wrote:
> I'm not following where the aliasing feature I'm looking for is.
> Looking at the patch I didn't see it either.  Essentially what I'm
> looking for is when a user searches for person_name that the query
> turns into person_name:john OR person_name_first:john OR
> person_name_last:john.  I don't see anything like that here, am I just
> missing it?
>
> On Tue, Jul 12, 2011 at 3:06 PM, Chris Hostetter
>  wrote:
>>
>> : Thanks Hoss.  I'm not really sure where to begin looking with this, I
>> : quickly read the JIRA but don't see mention of exposing the multiple
>> : aliases.  Can you provide any more details?
>>
>> i refered to it as "uf" or "user fields" ... note the specific comment i
>> linked to in the first url, and the subsequent patch
>>
>> the colon bug in edismax is what hung me up at the time.
>>
>> :
>> : On Tue, Jul 12, 2011 at 1:19 PM, Chris Hostetter
>> :  wrote:
>> : > : Taking a closer look at this it seems as if the
>> : > : DisjunctionMaxQueryParser supports doing multiple aliases and
>> : > : generating multiple queries, I didn't see this same capability in the
>> : > : ExtendedDismaxQParser, am I just missing it?  If this capability were
>> : >
>> : > it's never been exposed at a user level ... i started looking at adding 
>> it
>> : > to edismax but ran into a bug i couldn't uncover in the time i had to 
>> work
>> : > on it (which i think has since been fixed)...
>> : >
>> : > 
>> https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12839892&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12839892
>> : > https://issues.apache.org/jira/browse/SOLR-2409
>> : > https://issues.apache.org/jira/browse/SOLR-2368
>> : >
>> : >
>> : > -Hoss
>> : >
>> :
>>
>> -Hoss
>


Re: Exception using "Analyze" from the Solr Admin app

2011-07-13 Thread xcbt212x
I had this same problem.  

In my case, I was able to correct it by removing (perhaps safer to move
instead of remove) my /work/Catalina/ directory. 
Restarting tomcat caused that directory to be recreated, and after that it
worked for me.

Hope this helps anyone else who runs into this in the future.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-using-Analyze-from-the-Solr-Admin-app-tp3060464p3167220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: use query to set filter on fields

2011-07-13 Thread Igor Muntyan
Use the following query:


/select?q=*:*&rows=0&wt=json&indent=true&fl=NOT_IMPORTANT&facet=true&facet.zeros=false&fq={!tag=mk
q.op=OR}make:(Ford red 2010)&fq={!tag=cl  q.op=OR}color:(Ford red
2010)&fq={!tag=yr  q.op=OR}year:(Ford red 2010)&facet.field={!ex=cl,yr
key=mk}make&facet.field={!ex=mk,yr key=cl}color&facet.field={!ex=mk,cl
key=yr}year

The mk facet will tell you what makes are found in your free text search,
the cl will tell your the colors and yr will tell you the years. This
approach has the advantage of using all the synonyms defined for your text
field.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/use-query-to-set-filter-on-fields-tp2190595p3167444.html
Sent from the Solr - User mailing list archive at Nabble.com.


podcasts

2011-07-13 Thread Mourad K
Hello,

Are there any good podcasts for beginners in SOLR

Thanks

Moumou


Edismax parser and complex boolean queries

2011-07-13 Thread Jamie Johnson
I was wondering if anyone had details about what the edismax parser
does to complex boolean queries?


Re: podcasts

2011-07-13 Thread Erik Hatcher

On Jul 13, 2011, at 15:34 , Mourad K wrote:
> Are there any good podcasts for beginners in SOLR

There's a bunch of stuff we've created and posted to our site here:

   

Erik


Re: Edismax parser and complex boolean queries

2011-07-13 Thread Jamie Johnson
this may have been a silly question, looking at
https://issues.apache.org/jira/browse/SOLR-1553, it says that in the
absence of syntax errors the full query syntax is supported.  Still
would be nice to confirm though.

On Wed, Jul 13, 2011 at 6:18 PM, Jamie Johnson  wrote:
> I was wondering if anyone had details about what the edismax parser
> does to complex boolean queries?
>


XInclude Multiple Elements

2011-07-13 Thread Stephen Duncan Jr
I've spent some time looking at various conversations on this problem,
but I can't find a solution that works.  XInclude has to point a valid
XML document, with a single root element.  It should be possible to
use xpointer to specify children elements to include, but as far as I
can tell, the xpointer support doesn't include any scheme complex
enough to express "all the child elements of a given element", which
is what I would like.

So, here's what I have (it's more complicated than necessary for this
example because I also want it to support includes for both the root
level and a sub-level to use when doing schema, as I want to do
particular fields, but not all of them in the include file):

<>





<>





...several elements to be included at the root of a solrconfig.xml file...



<>



http://www.w3.org/2001/XInclude"; />



dismax
explicit
0.01
all_text
all_text
recip(ms(NOW,dateoccurredboost),3.16e-11,1,1)

1<-1 4<-2
100
*:*
highlight



spellcheck




So, that xpointer="root", unfortunately, grabs the 
element, but what I need, of course, is the contents of that element
(the children).

I see this post:
http://lucene.472066.n3.nabble.com/including-external-files-in-config-by-corename-td698324.html
that implies you can use #xpointer(/*/node()) to get all elements of
the root node (like if I changed my example to only have one include,
and just used multiple files, which is fine if it works), however my
testing gave this error: ERROR org.apache.solr.core.CoreContainer -
org.xml.sax.SAXParseException: Fragment identifiers must not be used.
The 'href' attribute value
'../../conf/solrconfigIncludes.xml#xpointer(root/node())' is not
permitted.  I tried several other variations of trying to come up with
pointers using node() or *, none of which worked.

And I also see this post:
http://lucene.472066.n3.nabble.com/solrconfig-xml-and-xinclude-td984058.html
that shows off a cumbersome way to list out each child element by
index number using the element scheme, which I assume works, but is
way to cumbersome to use.

Does any have any success using XInclude to include more than one
element?  I'm open to any and all approaches to having
partially-common configuration between cores.

Thanks,
Stephen

--
Stephen Duncan Jr
www.stephenduncanjr.com


Re: solr/velocity: funtion for sorting asc/desc

2011-07-13 Thread okayndc
Awesome, thanks Erik!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ' invisible ' words

2011-07-13 Thread deniz
Hi Jayendra,

I have changed the order and also removed the line related with synonyms...
but the result is still the same... somehow some words are just invisible
during my searches...

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3168039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: POST for queries, length/complexity limit of fq?

2011-07-13 Thread Sujatha Arun
I have used   long fq  with POST for persmission of the type fq= id:( 1...n)
and they are limited by the MAX boolean Clause Exception in the
solrconfig.xml.

Regards
Sujatha
On Wed, Jul 13, 2011 at 4:40 PM, pravesh  wrote:

> >1. I assume that it's worthwhile to rely on POST method instead of GET
> when issuing a search. Right? As I can see, this should work.
>
> We do restrict users search by passing unique id's(sometimes in thousands)
> in 'fq' and use POST method
>
> Thanx
> Pravesh
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Readercycle script issue

2011-07-13 Thread Pawan Darira
Hi

my readercycle script is not reloading the searcher. where i may be wrong.
please help

Thanks
Pawan


Re: How to add TrieIntField to a SolrInputDocument?

2011-07-13 Thread Gabriele Kahlout
this works:

doc.remove("wc");
SolrInputField wcField = new SolrInputField("wc");
wcField.setValue(150, 1.0f);
doc.put("wc",wcField);

On Wed, Jul 13, 2011 at 4:19 PM, Gabriele Kahlout
wrote:

> SolrInputDocument doc = new SolrInputDocument();
> doc.setField(id, "0");
> doc.setField("url", getURL("0"));
> doc.setField(content, "blah blah blah");
> *doc.setField(wc, 150); //wc is of solr.TrieIntField field type in
> schema.xml*
> assertU(adoc(doc));
> assertU(commit());
> assertNumFound(1);
>
> The above test fails until I change the following in schema.xml:
>  - 
>  + 
>
>
> On Sun, Jul 10, 2011 at 10:36 PM, Gabriele Kahlout <
> gabri...@mysimpatico.com> wrote:
>
>>
>> This was my problem:
>> 
>>
>> I had taken my queu from Nutch's schema:
>> 
>>
>>
>>
>> On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley 
>> wrote:
>>
>>> Something is wrong with your indexing.
>>> Is "wc" an indexed field?  If not, change it so it is, then re-index your
>>> data.
>>>
>>> If so, I'd recommend starting with the example data and filter for
>>> something like popularity:[6 TO 10] to convince yourself it works,
>>> then figuring out what you did differently in your schema/data.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>> On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout
>>>  wrote:
>>> > http://localhost:8080/solr/select?indent=on&version=2.2&q=*%3A**
>>> > &fq=wc%3A%5B255+TO+257%5D*
>>> > &start=0&rows=10&fl=*%2Cscore&qt=&wt=xml&explainOther=&hl.fl=
>>> >
>>> > The toString of the request:
>>> >
>>> {explainOther=&fl=*,score&indent=on&start=0&q=*:*&hl.fl=&qt=&wt=xml&fq=wc:[255+TO+257]&rows=1&version=2.2}
>>> >
>>> > Even when the FilterQuery is constructed in Java it doesn't work (i get
>>> > results that ignore the filter query completely).
>>> >
>>> >
>>> > On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan 
>>> wrote:
>>> >
>>> >> > I don't get it to work!
>>> >> >
>>> >> > If I specify no fq I get the first result with >> >> > name="wc">256
>>> >> >
>>> >> > With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing
>>> >> > comes out.
>>> >>
>>> >> If you give us the Full URL you are using, it can be helpful.
>>> >>
>>> >> Correct syntax is &fq=wc:[255 TO 257]
>>> >>
>>> >> You can use more that fq in a request.
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Regards,
>>> > K. Gabriele
>>> >
>>> > --- unchanged since 20/9/10 ---
>>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>>> > receipt within 48 hours then I don't resend the email.
>>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>>> time(x)
>>> > < Now + 48h) ⇒ ¬resend(I, this).
>>> >
>>> > If an email is sent by a sender that is not a trusted contact or the
>>> email
>>> > does not contain a valid code then the email is not received. A valid
>>> code
>>> > starts with a hyphen and ends with "X".
>>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
>>> ∈
>>> > L(-[a-z]+[0-9]X)).
>>> >
>>>
>>
>>
>>
>> --
>> Regards,
>> K. Gabriele
>>
>> --- unchanged since 20/9/10 ---
>> P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> receipt within 48 hours then I don't resend the email.
>> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>>
>> If an email is sent by a sender that is not a trusted contact or the email
>> does not contain a valid code then the email is not received. A valid code
>> starts with a hyphen and ends with "X".
>> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> L(-[a-z]+[0-9]X)).
>>
>>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).