RE: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-25 Thread Shahar Davidson
Thanks Otis.

I went through every piece of info that I could lay may hands on.
Most of them are about incompatible SolrJ versions (that's not my case) and 
there was one message from Mark Miller that Solr may respond with an XML  
instead of javabin in case there was some kind of http error being returned 
(that's not my case either).

I'm using distributed search.
I added some debug output to print out the response once the Invalid version 
exception is caught (in JavaBinCode.unmarshal() ).
What I saw is that the response actually contains the facet response in XML 
format, yet I also noticed that the response is corrupt (i.e. as if a chunk of 
text has been taken out of the middle of the reply - some kind of overrun 
perhaps?).

Any help would be appreciated.

Thanks,

Shahar.


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, December 21, 2012 6:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Invalid version (expected 2, but 60) or the data in not in 
'javabin'

Hi,

Have a look at http://search-lucene.com/?q=invalid+version+javabin

Otis
--
Solr Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson shah...@checkpoint.comwrote:

 Hi,

 I'm encountering this error randomly when running a distributed facet.
  (i.e. I'm sending the exact same request, yet this does not reproduce
 consistently)
 I have about  180 shards that are being queried.
 It seems that when Solr distributes the request to the shards one , or 
 perhaps more, shards return an  XML reply instead of  Javabin.

 I added some debug output to JavaBinCode.unmarshal  (as done in the 
 debugging.patch of SOLR-3258) to check whether the XML reply holds an 
 error or not, and I noticed that the XML actually holds the response 
 from one of the shards.

 I'm using the patch provided in SOLR-2894 on top of trunk 1404975.

 Has anyone encountered such an issue? Any ideas?

 Thanks,

 Shahar.



Email secured by Check Point


MoreLikeThis supporting multiple document IDs as input?

2012-12-25 Thread David Parks
I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mltq=id:[document
id]mlt.fl=[field1],[field2],[field3]fl=idrows=10

But can I send it 2+ document IDs as the query?



how to use RemoveDuplicatesTokenFilterFactory?

2012-12-25 Thread vrpar...@gmail.com
I want to avoid duplicate values in one multivalued field.

i am using dataimport handler to import data,  the particular multivalued
field are being filled from xml source. now that xml has duplicate values,
but i want to have unique valued in this multivalued field.

e.g. xml
data
 a1 
 b1 
 a1 
 a1 
/data

i have added RemoveDuplicatesTokenFilterFactory in data type of the field,
in index analyzer.
still it gives below o/p.

arr name=field
  stra1/str
  strb1/str
  stra1/str
  stra1/str
/arr

i am using solr 3.5.

how can i avoid importing duplicate values in the field?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-use-RemoveDuplicatesTokenFilterFactory-tp4029004.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to use RemoveDuplicatesTokenFilterFactory?

2012-12-25 Thread Ahmet Arslan
 I want to avoid duplicate values in
 one multivalued field.
 
 i am using dataimport handler to import data,  the
 particular multivalued
 field are being filled from xml source. now that xml has
 duplicate values,
 but i want to have unique valued in this multivalued field.
 
 e.g. xml
 data
      a1 
      b1 
      a1 
      a1 
 /data
 
 i have added RemoveDuplicatesTokenFilterFactory in data type
 of the field,
 in index analyzer.
 still it gives below o/p.
 
 arr name=field
   stra1/str
   strb1/str
   stra1/str
   stra1/str
 /arr
 
 i am using solr 3.5.
 
 how can i avoid importing duplicate values in the field?
 

RDTF removes duplicates at the same position. 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory

Elegant solution would be subclass the 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html

and create DistinctFieldValueUpdateProcessorFactory or something like that. 
MinFieldValueUpdateProcessorFactory can be used as an example.


Re: how to use RemoveDuplicatesTokenFilterFactory?

2012-12-25 Thread Ahmet Arslan
 The values are at same logical
 position.

You mean positionIncrementGap set to 0? can you see that duplicates are removed 
in analysis page?

By the way returned values are original (stored) values. Analysis (tokenfilter 
tokenizer etc) are about indexed values. UpdateProcessorFactory can change 
stored ( returned) values.


Re: solr java API for fuzzy query

2012-12-25 Thread Jack Krupansky
Otis, stop teasing people! You know as well as I do that 2 is the maximum 
edit distance for fuzzy query in 4.0.


So,

   Keyword~5

Is treated as:

   Keyword~2

Check with debug=query to see.

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Tuesday, December 25, 2012 1:38 AM
To: solr-user@lucene.apache.org
Subject: Re: solr java API for fuzzy query

Hi Alexey,

You can use the Lucene query syntax with Solr, does that help?
Try Keyword~5 for example.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Tue, Dec 25, 2012 at 12:19 AM, Yakubovich Alexey (Nokia-LC/Chicago) 
alexey.yakubov...@nokia.com wrote:


Is there any java API available in Solr for fuzzy query, similar to the
Lucene org.apache.lucene.search.FuzzyQuery class?

More general, : is there any general way to define query with Lucene java
API and invoke it thru Solr (kind of Lucene-Solr bridge)?

Thanks
Alexey



The information contained in this communication may be CONFIDENTIAL and is
intended only for the use of the recipient(s) named above. If you are not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of its contents, is
strictly prohibited. If you have received this communication in error,
please notify the sender and delete/destroy the original message and any
copy of it from your computer or paper files.





[ANNOUNCE] Apache Solr 3.6.2 released

2012-12-25 Thread Robert Muir
25 December 2012, Apache Solr™ 3.6.2 available

The Lucene PMC and Santa Claus are pleased to announce the release of
Apache Solr 3.6.2.

Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
distributed search and index replication, and it powers the search and
navigation features of many of the world's largest internet sites.

This release is a bug fix release for version 3.6.1. It contains
numerous bug fixes, optimizations, and improvements, some of which are
highlighted below.  The release is available for immediate download
at: http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see note
below).

See the CHANGES.txt file included with the release for a full list of details.

Solr 3.6.2 Release Highlights:

 * Fixed ConcurrentModificationException during highlighting, if all
fields were requested.

 * Fixed edismax queryparser to apply minShouldMatch to implicit
boolean queries.

 * Several bugfixes to the DataImportHandler.

 * Bug fixes from Apache Lucene 3.6.2.

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases.  It is possible that the mirror you
are using may not have replicated the release yet.  If that is the
case, please try another mirror.  This also goes for Maven access.

Happy holidays and happy searching,

Lucene/Solr developers


Re: facet query

2012-12-25 Thread Anirudha Jadhav
Please see http://wiki.apache.org/solr/SimpleFacetParameters for more
details

On Friday, December 21, 2012, hank williams wrote:

 Great, thank you.

  Date: Fri, 21 Dec 2012 14:42:13 +0100
  From: r@solr.pl javascript:;
  To: solr-user@lucene.apache.org javascript:;
  Subject: Re: facet query
 
  Hello!
 
  Try facet.mincount=1, that should help.
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   Hi, is there a way with facets to say, return facets that are not
   0? I have facet=truefacet.field=officefacet.field=name as my
   facet parameters, and with some of my queries it brings back people
 that have a value of 0.
   Thanks
 




-- 
Anirudha P. Jadhav


Re: solr java API for fuzzy query

2012-12-25 Thread Otis Gospodnetic
But you are assuming 4.0 :)

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Dec 25, 2012 10:47 AM, Jack Krupansky j...@basetechnology.com wrote:

 Otis, stop teasing people! You know as well as I do that 2 is the maximum
 edit distance for fuzzy query in 4.0.

 So,

Keyword~5

 Is treated as:

Keyword~2

 Check with debug=query to see.

 -- Jack Krupansky

 -Original Message- From: Otis Gospodnetic
 Sent: Tuesday, December 25, 2012 1:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solr java API for fuzzy query

 Hi Alexey,

 You can use the Lucene query syntax with Solr, does that help?
 Try Keyword~5 for example.

 Otis
 --
 SOLR Performance Monitoring - 
 http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html
 Search Analytics - 
 http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html



 On Tue, Dec 25, 2012 at 12:19 AM, Yakubovich Alexey (Nokia-LC/Chicago) 
 alexey.yakubov...@nokia.com wrote:

  Is there any java API available in Solr for fuzzy query, similar to the
 Lucene org.apache.lucene.search.**FuzzyQuery class?

 More general, : is there any general way to define query with Lucene java
 API and invoke it thru Solr (kind of Lucene-Solr bridge)?

 Thanks
 Alexey


 __**__
 The information contained in this communication may be CONFIDENTIAL and is
 intended only for the use of the recipient(s) named above. If you are not
 the intended recipient, you are hereby notified that any dissemination,
 distribution, or copying of this communication, or any of its contents, is
 strictly prohibited. If you have received this communication in error,
 please notify the sender and delete/destroy the original message and any
 copy of it from your computer or paper files.





Spatial filter in solr 4.0 - Intersects operation with parameters

2012-12-25 Thread mladen micevic
Hi,
I went through example for spatial search in Solr 4.0
(http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
Both indexing and searching work fine.

Example is: fq=geo:Intersects(-74.093 41.042 -69.347 44.558) 

My problem is how to send values to Intersects operation as parameters.
If would like to send custom parameters in URL: 
...lon1=-74.093lat1=41.042lon2=-69.347lat2=44.558
and have default filter query:
  fq=geo:Intersects($lon1 $lat1 $lon2 $lat2)
I tried this approach - but it did not work.

How do I do this?

Using {!bbox} is not documented in 4.0 wiki.
Anyways, I tried to use it against geo field but got following error:
   field does not support spatial filtering ...
Can I use {!bbox}  in 4.0 ?


Thanks.
Mladen



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-filter-in-solr-4-0-Intersects-operation-with-parameters-tp4029034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic collections in SolrCloud for log indexing

2012-12-25 Thread Mark Miller
I've been thinking about aliases for a while as well. Seem very handy and 
fairly easy to implement. So far there has just always been higher priority 
things (need to finish collection api responses this week…) but this is 
something I'd def help work on.

- Mark

On Dec 25, 2012, at 1:49 AM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Hi,
 
 Right, this is not really about routing in ElasticSearch-sense.
 What's handy for indexing logs are index aliases which I thought I had
 added to JIRA a while back, but it looks like I have not.
 Index aliases would let you keep a last 7 days alias fixed while
 underneath you push and pop an index every day without the client app
 having to adjust.
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html
 
 
 
 On Mon, Dec 24, 2012 at 4:30 AM, Per Steffensen st...@designware.dk wrote:
 
 I believe it is a misunderstandig to use custom routing (or sharding as
 Erick calls it) for this kind of stuff. Custom routing is nice if you want
 to control which slice/shard under a collection a specific document goes to
 - mainly to be able to control that two (or more) documents are indexed on
 the same slice/shard, but also just to be able to control on which
 slice/shard a specific document is indexed. Knowing/controlling this kind
 of stuff can be used for a lot of nice purposes. But you dont want to move
 slices/shards around among collection or delete/add slices from/to a
 collection - unless its for elasticity reasons.
 
 I think you should fill a collection every week/month and just keep those
 collections as is. Instead of ending up with a big historic collection
 containing many slices/shards/cores (one for each historic week/month), you
 will end up with many historic collections (one for each historic
 week/month). Searching historic data you will have to cross-search those
 historic collections, but that is no problem at all. If Solr Cloud is made
 at it is supposed to be made (and I believe it is) it shouldnt require more
 resouces or be harder in any way to cross-search X slices across many
 collections, than it is to cross-search X slices under the same collection.
 
 Besides that see my answer for topic Will SolrCloud always slice by ID
 hash? a few days back.
 
 Regards, Per Steffensen
 
 
 On 12/24/12 1:07 AM, Erick Erickson wrote:
 
 I think this is one of the primary use-cases for custom sharding. Solr 4.0
 doesn't really lend itself to this scenario, but I _believe_ that the
 patch
 for custom sharding has been committed...
 
 That said, I'm not quite sure how you drop off the old shard if you don't
 need to keep old data. I'd guess it's possible, but haven't implemented
 anything like that myself.
 
 FWIW,
 Erick
 
 
 On Fri, Dec 21, 2012 at 12:17 PM, Upayavira u...@odoko.co.uk wrote:
 
 I'm working on a system for indexing logs. We're probably looking at
 filling one core every month.
 
 We'll maintain a short term index containing the last 7 days - that one
 is easy to handle.
 
 For the longer term stuff, we'd like to maintain a collection that will
 query across all the historic data, but that means every month we need
 to add another core to an existing collection, which as I understand it
 in 4.0 is not possible.
 
 How do people handle this sort of situation where you have rolling new
 content arriving? I'm sure I've heard people using SolrCloud for this
 sort of thing.
 
 Given it is logs, distributed IDF has no real bearing.
 
 Upayavira
 
 
 



Re: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-25 Thread Mark Miller
The problem is not necessary xml - it seems to be anything that is not valid 
javabin - I've just most often seen it with 404s that return an html error.

I'm not sure if there is a jira issue or not, but this type of thing should be 
failing in a more user friendly way.

As to why your response is corrupt, I have no guesses.

This is easily repeatable? It's happening every time, or randomly?

- Mark

On Dec 25, 2012, at 4:23 AM, Shahar Davidson shah...@checkpoint.com wrote:

 Thanks Otis.
 
 I went through every piece of info that I could lay may hands on.
 Most of them are about incompatible SolrJ versions (that's not my case) and 
 there was one message from Mark Miller that Solr may respond with an XML  
 instead of javabin in case there was some kind of http error being returned 
 (that's not my case either).
 
 I'm using distributed search.
 I added some debug output to print out the response once the Invalid 
 version exception is caught (in JavaBinCode.unmarshal() ).
 What I saw is that the response actually contains the facet response in XML 
 format, yet I also noticed that the response is corrupt (i.e. as if a chunk 
 of text has been taken out of the middle of the reply - some kind of overrun 
 perhaps?).
 
 Any help would be appreciated.
 
 Thanks,
 
 Shahar.
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
 Sent: Friday, December 21, 2012 6:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Invalid version (expected 2, but 60) or the data in not in 
 'javabin'
 
 Hi,
 
 Have a look at http://search-lucene.com/?q=invalid+version+javabin
 
 Otis
 --
 Solr Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html
 
 
 
 
 On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson 
 shah...@checkpoint.comwrote:
 
 Hi,
 
 I'm encountering this error randomly when running a distributed facet.
 (i.e. I'm sending the exact same request, yet this does not reproduce
 consistently)
 I have about  180 shards that are being queried.
 It seems that when Solr distributes the request to the shards one , or 
 perhaps more, shards return an  XML reply instead of  Javabin.
 
 I added some debug output to JavaBinCode.unmarshal  (as done in the 
 debugging.patch of SOLR-3258) to check whether the XML reply holds an 
 error or not, and I noticed that the XML actually holds the response 
 from one of the shards.
 
 I'm using the patch provided in SOLR-2894 on top of trunk 1404975.
 
 Has anyone encountered such an issue? Any ideas?
 
 Thanks,
 
 Shahar.
 
 
 
 Email secured by Check Point



Re: MoreLikeThis supporting multiple document IDs as input?

2012-12-25 Thread Jack Krupansky

MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document that 
the query matches.


The MLT search component returns similar documents for each of the documents 
in the search results, but processes each search result base document one at 
a time and keeps its similar documents segregated by each of the base 
documents.


It sounds like you wanted to merge the base search results and then find 
documents similar to that merged super-document. Is that what you were 
really seeking, as opposed to what the MLT component does? Unfortunately, 
you can't do that with the components as they are.


You would have to manually merge the values from the base documents and then 
you could POST that text back to the MLT handler and find similar documents 
using the posted text rather than a query. Kind of messy, but in theory that 
should work.


-- Jack Krupansky

-Original Message- 
From: David Parks

Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

 - The user is browsing 5 different articles
 - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mltq=id:[document
id]mlt.fl=[field1],[field2],[field3]fl=idrows=10

But can I send it 2+ document IDs as the query? 



Re: Spatial filter in solr 4.0 - Intersects operation with parameters

2012-12-25 Thread David Smiley (@MITRE.org)
Hi Mladen,

Despite some similarities at first glance, the Solr 4 spatial fields are not
implemented with Solr query parsers, unlike Solr 3 spatial.  Everything in
quotes is handled by the field type.  What you're looking for is for the
Solr 3 geospatial functions to be adapted to support the Solr 4 spatial
fields.  I created an issue, SOLR-4230 to track this.  I never got around to
doing this before because it wasn't strictly necessary to use the new
fields, but it is of course a nice-to-have.

~ David


mladen micevic wrote
 Hi,
 I went through example for spatial search in Solr 4.0
 (http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 Both indexing and searching work fine.
 
 Example is: fq=geo:Intersects(-74.093 41.042 -69.347 44.558) 
 
 My problem is how to send values to Intersects operation as parameters.
 If would like to send custom parameters in URL: 
 ...lon1=-74.093lat1=41.042lon2=-69.347lat2=44.558
 and have default filter query:
   fq=geo:Intersects($lon1 $lat1 $lon2 $lat2)
 I tried this approach - but it did not work.
 
 How do I do this?
 
 Using {!bbox} is not documented in 4.0 wiki.
 Anyways, I tried to use it against geo field but got following error:
field does not support spatial filtering ...
 Can I use {!bbox}  in 4.0 ?
 
 
 Thanks.
 Mladen





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-filter-in-solr-4-0-Intersects-operation-with-parameters-tp4029034p4029071.html
Sent from the Solr - User mailing list archive at Nabble.com.