date:20140620

Hi,
I know how to send solrconfig.xml and schema.xml files to SolR using curl
commands.
But my problem is that i want to send them with java, and i can't find a
way to do so.
I used HttpComponentsand got http headers before the file begins, which SAX
parser does not like at all.

What is the best way to send this files from a java program ?

What i have once i sent the file is something like that :

*��:
solr_admin solr_resources resource_value��--9NDJNu2AW4jtIyX6ggQAgEqI3FXp3JpDZ6
Content-Disposition: form-data; name=solrconfig.xml; filename=solrconfig.xml
Content-Type: application/xml; charset=ISO-8859-1
Content-Transfer-Encoding:** binary*
config!-- In all configuration below, a prefix of solr. for class names
 is an alias that causes solr to search appropriate packages,
 including org.apache.solr.(search|update|request|core|analysis)
[Continued...]

Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi

Hi Daniel,

You said inputs are human-generated and outputs are lucene objects. So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek



On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Alexandre's response is very thorough, so I'm really simplifying things, I
 confess but here's my query parsers for dummies. :)

 In terms of inputs/outputs, a QueryParser takes a string (generally assumed
 to be human generated i.e. something a user might type in, so maybe a
 sentence, a set of words, the format can vary) and outputs a Lucene Query
 object (

 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
 ),
 which in fact is a kind of tree (again, I'm simplifying I know) since a
 query can contain nested expressions.

 So very loosely its a translator from a human-generated query into the
 structure that Lucene can handle.  There are several different query
 parsers since they all use different input syntax, and ways of handling
 different constructs (to handle A and B, should the user type +A +B or A
 and B or just A B for example), and have different levels of support for
 the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
 PhraseQuery, etc.

 We for example use an XML-based query parser.  Why (you might well ask!),
 well we had an already used and supported query syntax of our own, which
 our users understood, so we couldn't use an off the shelf query parser.  We
 could have built our own in Java, but for a variety of reasons we parse our
 queries in a front-end system ahead of Solr (which is C++-based), so we
 needed an interim format to pass queries to Solr that was as near to a
 Lucene Query object as we could get (and there was an existing XML parser
 to save us starting from square one!).

 As part of that Query construction (but independent of which QueryParser
 you use), Solr will also make use of a set of Tokenizers and Filters (

 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
 )
 but that's more to do with dealing with the terms in the query (so in my
 examples above, is A a real word, does it need stemming, lowercasing,
 removing because its a stopword, etc).

Re: About Query Parser

2014-06-20 Thread Alexandre Rafalovitch

That's *:* and a special case. There is no scoring here, nor searching.
Just a dump of documents. Not even filtering or faceting. I sure hope you
have more interesting examples.

Regards,
Alex
On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote:

Hi Daniel,

You said inputs are human-generated and outputs are lucene objects. So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek

On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
wrote:

Alexandre's response is very thorough, so I'm really simplifying things,
I
confess but here's my query parsers for dummies. :)

In terms of inputs/outputs, a QueryParser takes a string (generally
assumed
to be human generated i.e. something a user might type in, so maybe a
sentence, a set of words, the format can vary) and outputs a Lucene Query
object (

http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
),
which in fact is a kind of tree (again, I'm simplifying I know) since a
query can contain nested expressions.

So very loosely its a translator from a human-generated query into the
structure that Lucene can handle. There are several different query
parsers since they all use different input syntax, and ways of handling
different constructs (to handle A and B, should the user type +A +B or
A
and B or just A B for example), and have different levels of support
for
the various Query structures that Lucene can handle: SpanQuery,
FuzzyQuery,
PhraseQuery, etc.

We for example use an XML-based query parser. Why (you might well ask!),
well we had an already used and supported query syntax of our own, which
our users understood, so we couldn't use an off the shelf query parser.
We
could have built our own in Java, but for a variety of reasons we parse
our
queries in a front-end system ahead of Solr (which is C++-based), so we
needed an interim format to pass queries to Solr that was as near to a
Lucene Query object as we could get (and there was an existing XML parser
to save us starting from square one!).

As part of that Query construction (but independent of which QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters (

https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
)
but that's more to do with dealing with the terms in the query (so in my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).

Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi

All right let me put this.

http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
.

I just want to know what is this form. is it lucene query or this query
should go under query parser to get converted to lucene query.

Thanks,
Vivek

On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

That's *:* and a special case. There is no scoring here, nor searching.
Just a dump of documents. Not even filtering or faceting. I sure hope you
have more interesting examples.

Regards,
Alex
On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote:

Hi Daniel,

You said inputs are human-generated and outputs are lucene objects.
So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek

On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
wrote:

Alexandre's response is very thorough, so I'm really simplifying
things,
I
confess but here's my query parsers for dummies. :)

http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
),
which in fact is a kind of tree (again, I'm simplifying I know)
since a
query can contain nested expressions.

So very loosely its a translator from a human-generated query into the
structure that Lucene can handle. There are several different query
parsers since they all use different input syntax, and ways of handling
different constructs (to handle A and B, should the user type +A +B
or
A
and B or just A B for example), and have different levels of support
for
the various Query structures that Lucene can handle: SpanQuery,
FuzzyQuery,
PhraseQuery, etc.

We for example use an XML-based query parser. Why (you might well
ask!),
well we had an already used and supported query syntax of our own,
which
our users understood, so we couldn't use an off the shelf query parser.
We
could have built our own in Java, but for a variety of reasons we parse
our
queries in a front-end system ahead of Solr (which is C++-based), so we
needed an interim format to pass queries to Solr that was as near to a
Lucene Query object as we could get (and there was an existing XML
parser
to save us starting from square one!).

As part of that Query construction (but independent of which
QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters (

https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
)
but that's more to do with dealing with the terms in the query (so in
my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).

Re: About Query Parser

2014-06-20 Thread Daniel Collins

I would say *:* is a human-readable/writable query. as is
inStock:false. The former will be converted by the query parser into a
MatchAllDocsQuery which is what Lucene understands. The latter will be
converted (again by the query parser) into some query. Now this is where
*which* query parser you are using is important. Is inStock a word to be
queried, or a field in your schema? Probably the latter, but the query
parser has to determine that using the Solr schema. So I would expect that
query to be converted to a TermQuery(Term(inStock, false)), so a query
for the value false in the field inStock.

This is all interesting but what are you really trying to find out? If you
just want to run queries and see what they translate to, you can use the
debug options when you send the query in, and then Solr will return to you
both the raw query (with any other options that the query handler might
have added to your query) as well as the Lucene Query generated from it.

e.g.from running : on a solr instance.

rawquerystring: *:*, querystring: *:*, parsedquery:
MatchAllDocsQuery(*:*), parsedquery_toString: *:*, QParser:
LuceneQParser,
Or (this shows the difference between raw query syntax and parsed query
syntax) rawquerystring: body_en:test AND headline_en:hello, querystring:
body_en:test AND headline_en:hello, parsedquery: +body_en:test
+headline_en:hello, parsedquery_toString: +body_en:test
+headline_en:hello, QParser: LuceneQParser,

On 20 June 2014 13:05, Vivekanand Ittigi vi...@biginfolabs.com wrote:

All right let me put this.

http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
.

I just want to know what is this form. is it lucene query or this query
should go under query parser to get converted to lucene query.

Thanks,
Vivek

On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com

wrote:

That's *:* and a special case. There is no scoring here, nor searching.
Just a dump of documents. Not even filtering or faceting. I sure hope you
have more interesting examples.

Regards,
Alex
On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com
wrote:

Hi Daniel,

You said inputs are human-generated and outputs are lucene objects.
So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek

On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com

wrote:

Alexandre's response is very thorough, so I'm really simplifying
things,
I
confess but here's my query parsers for dummies. :)

In terms of inputs/outputs, a QueryParser takes a string (generally
assumed
to be human generated i.e. something a user might type in, so
maybe a
sentence, a set of words, the format can vary) and outputs a Lucene
Query
object (

http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
),
which in fact is a kind of tree (again, I'm simplifying I know)
since a
query can contain nested expressions.

So very loosely its a translator from a human-generated query into
the
structure that Lucene can handle. There are several different query
parsers since they all use different input syntax, and ways of
handling
different constructs (to handle A and B, should the user type +A +B
or
A
and B or just A B for example), and have different levels of
support
for
the various Query structures that Lucene can handle: SpanQuery,
FuzzyQuery,
PhraseQuery, etc.

We for example use an XML-based query parser. Why (you might well
ask!),
well we had an already used and supported query syntax of our own,
which
our users understood, so we couldn't use an off the shelf query
parser.
We
could have built our own in Java, but for a variety of reasons we
parse
our
queries in a front-end system ahead of Solr (which is C++-based), so
we
needed an interim format to pass queries to Solr that was as near to
a
Lucene Query object as we could get (and there was an existing XML
parser
to save us starting from square one!).

As part of that Query construction (but independent of which
QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters
(

https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
)
but that's more to do with dealing with the terms in the query (so in
my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).

Trouble with TrieDateFields

2014-06-20 Thread Jared Whiklo

I am upgrading an index from Solr 3.6 to 4.2.0.

Everything has been picked up except for the old DateFields.

I read some posts that due to the extra functionality of the TrieDateField 
you would need to re-index for those fields.

To avoid re-indexing I was trying to do a Partial Update 
(http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/),

I am doing this with a Python script that does a query, pulls the field 
contents and then reformats it and sends a JSON update back to Solr.

But no matter what I send Solr gives me the same error

SEVERE: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.util.Date
at org.apache.solr.schema.TrieDateField.toObject(TrieDateField.java:70)
at org.apache.solr.schema.TrieDateField.toObject(TrieDateField.java:55)
……

I have tried send the date as a date string to be parsed and as a number of 
milliseconds from or before epoch. Both give the same error.

Any suggestions would be appreciated.

Examples of record attempts.

As seconds
--
2014-06-19 16:02:09,503 - solr_date_fixer - DEBUG - old record - 
{u'timestamp': 
u'ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2013-07-17T18:09:59.049', u'PID': 
u'uofm:1235128'}
2014-06-19 16:02:09,503 - solr_date_fixer - DEBUG - new record - {'timestamp': 
{'set': 1374084599049.0}, 'PID': u'uofm:1235128'}
--

As date
--
2014-06-20 08:11:27,986 - solr_date_fixer - DEBUG - old record - 
{u'timestamp': 
u'ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2013-07-17T18:09:59.049', u'PID': 
u'uofm:1235128'}
2014-06-20 08:11:27,986 - solr_date_fixer - DEBUG - new record - {'timestamp': 
{'set': u'2013-07-17T18:09:59.049Z'}, 'PID': u'uofm:1235128'}
---
--
Jared Whiklo
Developer – Digital Initiatives
University of Manitoba Libraries
v: 204-474-6523
c: 204-228-1943
e: jared_whi...@umanitoba.ca

Re: [ANN] Heliosearch 0.06 released, native code faceting

On Fri, Jun 20, 2014 at 12:36 AM, Andy angelf...@yahoo.com.invalid wrote:
 Congrats! Any idea when will native faceting  off-heap fieldcache be 
 available for multivalued fields? Most of my fields are multivalued so that's 
 the big one for me.

Hopefully within the next month or so
If anyone wants to help out, the github issue is here:
https://github.com/Heliosearch/heliosearch/issues/13

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data



 On Thursday, June 19, 2014 3:46 PM, Yonik Seeley yo...@heliosearch.com 
 wrote:



 FYI, for those who want to try out the new native code faceting, this
 is the first release containing it (for single valued string fields
 only as of yet).

 http://heliosearch.org/download/

 Heliosearch v0.06

 Features:
 o  Heliosearch v0.06 is based on (and contains all features of)
 Lucene/Solr 4.9.0
 o  Native code faceting for single valued string fields.
 - Written in C++, statically compiled with gcc for Windows, Mac OS-X, 
 Linux
 - static compilation avoids JVM hotspot warmup period,
 mis-compilation bugs, and variations between runs
 - Improves performance over 2x
 o  Top level Off-heap fieldcache for single valued string fields in nCache.
 - Improves sorting and faceting speed
 - Reduces garbage collection overhead
 - Eliminates FieldCache “insanity” that exists in Apache Solr from
 faceting and sorting on the same field
 o  Full request Parameter substitution / macro expansion, including
 default value support.
 o  frange query now only returns documents with a value.
  For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will
 also return documents without a value since the numeric default value
 of 0 lies within the range requested.
 o  New JSON features via Noggit upgrade, allowing optional comments
 (C/C++ and shell style), unquoted keys, and relaxed escaping that
 allows one to backslash escape any character.


 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data

Re: Question about sending solrconfig and schema files with java

On 6/20/2014 5:16 AM, Frederic Esnault wrote:
 I know how to send solrconfig.xml and schema.xml files to SolR using curl
 commands.
 But my problem is that i want to send them with java, and i can't find a
 way to do so.
 I used HttpComponentsand got http headers before the file begins, which SAX
 parser does not like at all.
 
 What is the best way to send this files from a java program ?

Chances are good that you can duplicate your curl requests with
HttpSolrServer and SolrQuery, part of solrj, which is in the Solr
download under the dist directory.

If you are running SolrCloud, then the configs in Zookeeper are directly
accessible with Java code.  You should take a look at the source code,
in ZkController#uploadConfigDir, to see how the uploadToZK methods work.
 You should be able to use the SolrZkClient#makePath method, just like
uploadToZK does.

To use SolrZKClient (or the requests similar to what you do now with
curl), you will need the solrj jar and it's dependencies.  The
recommended versions of those dependencies can be found in the download,
in the dist/solrj-lib directory.  To get the SolrZkClient, you would
need to establish a CloudSolrServer object, then retrieve the
ZkStateReader from the CloudSolrServer, and the SolrZkClient from the
ZkStateReader.

Thanks,
Shawn

Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yago Riveiro

Yonik,

This native code uses in any way the docValues?

In the past I was forced to indexed a big portion of my data with docValues 
enable. OOP problems with large terms dictionaries and GC was my main problem.

Other good optimization can be do facet aggregations offsite the heap to 
minimize the GC, To ensure that facet aggregations has enough ram we need a 
large heap, in machines with a lot of ram maybe if this aggregation was made 
offsite this allow us reduce the heap size.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, June 20, 2014 at 2:33 PM, Yonik Seeley wrote:

 On Fri, Jun 20, 2014 at 12:36 AM, Andy angelf...@yahoo.com.invalid 
 (mailto:angelf...@yahoo.com.invalid) wrote:
  Congrats! Any idea when will native faceting  off-heap fieldcache be 
  available for multivalued fields? Most of my fields are multivalued so 
  that's the big one for me.
  
  
 Hopefully within the next month or so
 If anyone wants to help out, the github issue is here:
 https://github.com/Heliosearch/heliosearch/issues/13
  
 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data
  
  
  
  On Thursday, June 19, 2014 3:46 PM, Yonik Seeley yo...@heliosearch.com 
  (mailto:yo...@heliosearch.com) wrote:
   
   
   
  FYI, for those who want to try out the new native code faceting, this
  is the first release containing it (for single valued string fields
  only as of yet).
   
  http://heliosearch.org/download/
   
  Heliosearch v0.06
   
  Features:
  o Heliosearch v0.06 is based on (and contains all features of)
  Lucene/Solr 4.9.0
  o Native code faceting for single valued string fields.
  - Written in C++, statically compiled with gcc for Windows, Mac OS-X, Linux
  - static compilation avoids JVM hotspot warmup period,
  mis-compilation bugs, and variations between runs
  - Improves performance over 2x
  o Top level Off-heap fieldcache for single valued string fields in nCache.
  - Improves sorting and faceting speed
  - Reduces garbage collection overhead
  - Eliminates FieldCache “insanity” that exists in Apache Solr from
  faceting and sorting on the same field
  o Full request Parameter substitution / macro expansion, including
  default value support.
  o frange query now only returns documents with a value.
  For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will
  also return documents without a value since the numeric default value
  of 0 lies within the range requested.
  o New JSON features via Noggit upgrade, allowing optional comments
  (C/C++ and shell style), unquoted keys, and relaxed escaping that
  allows one to backslash escape any character.
   
   
  -Yonik
  http://heliosearch.org - native code faceting, facet functions,
  sub-facets, off-heap data

FW: Indexing a term into separate Lucene indexes

2014-06-20 Thread Huang, Roger


If I have documents with a person and his email address: 
u...@domain.commailto:u...@domain.com

How can I configure Solr (4.6) so that the email address source field is 
indexed as

-  the user part of the address (e.g., user) is in Lucene index X

-  the domain part of the address (e.g., domain.com) is in a separate 
Lucene index Y

I would like to be able search as follows:

-  Find all people whose email addresses have user part = userXyz

-  Find all people whose email addresses have domain part = 
domainABC.com

-  Find the person with exact email address = 
user...@domainabc.commailto:user...@domainabc.com

Would I use a copyField declaration in my schema?
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

Thanks!

Re: [ANN] Heliosearch 0.06 released, native code faceting

On Fri, Jun 20, 2014 at 10:15 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 Yonik,

 This native code uses in any way the docValues?

Nope... not yet.  It is something I think we should look into in the
future though.

 In the past I was forced to indexed a big portion of my data with docValues 
 enable. OOP problems with large terms dictionaries and GC was my main problem.

 Other good optimization can be do facet aggregations offsite the heap to 
 minimize the GC,

Yeah, the single-valued string faceting in Heliosearch currently does
this (the counts array is also off-heap).

 To ensure that facet aggregations has enough ram we need a large heap, in 
 machines with a lot of ram maybe if this aggregation was made offsite this 
 allow us reduce the heap size.

Yeah, it's nice not having to worry so much about the correct heap size too.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Re: Question about sending solrconfig and schema files with java

Hi Shawn,

First thank you for taking the time to answer me.

Actually i tried looking for a way to use SolrJ to upload my files, but i
cannot find anywhere informations about how to create nodes with their
config files using SolrJ.
All websites, blogs and docs i found seem to be based on the principle that
the core already exist or that the config files are already there.

I tried using SolrJ anyway, using CoreAdminRequest.create(), but i can only
pass a config file name and a schema file name, not the files themselves,
so i don't see how to do this.
Result of this try is :
INFO: Sending SolR config ...
4226 [AWT-EventQueue-0] INFO
org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http
client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No
resource solrconfig.xml for core solrks.villes_france, did you miss to
upload it?
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:462)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(CoreAdminRequest.java:534)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(CoreAdminRequest.java:514)

*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com

2014-06-20 15:35 GMT+02:00 Shawn Heisey s...@elyograg.org:

On 6/20/2014 5:16 AM, Frederic Esnault wrote:
I know how to send solrconfig.xml and schema.xml files to SolR using curl
commands.
But my problem is that i want to send them with java, and i can't find a
way to do so.
I used HttpComponentsand got http headers before the file begins, which
SAX
parser does not like at all.

What is the best way to send this files from a java program ?

Chances are good that you can duplicate your curl requests with
HttpSolrServer and SolrQuery, part of solrj, which is in the Solr
download under the dist directory.

If you are running SolrCloud, then the configs in Zookeeper are directly
accessible with Java code. You should take a look at the source code,
in ZkController#uploadConfigDir, to see how the uploadToZK methods work.
You should be able to use the SolrZkClient#makePath method, just like
uploadToZK does.

To use SolrZKClient (or the requests similar to what you do now with
curl), you will need the solrj jar and it's dependencies. The
recommended versions of those dependencies can be found in the download,
in the dist/solrj-lib directory. To get the SolrZkClient, you would
need to establish a CloudSolrServer object, then retrieve the
ZkStateReader from the CloudSolrServer, and the SolrZkClient from the
ZkStateReader.

Thanks,
Shawn

Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Alexandre Rafalovitch

On Fri, Jun 20, 2014 at 9:46 PM, Frederic Esnault fesna...@serenzia.com wrote:
 Actually i tried looking for a way to use SolrJ to upload my files, but i
 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.

Is this something solvable with configsets?
https://cwiki.apache.org/confluence/display/solr/Config+Sets

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

Re: Question about sending solrconfig and schema files with java

Hi Alexandre,

Nope, I cannot access the server (well i can actually, but my users won't
be able to do so), and i can't rely on an http curl call.

As for the final http call indicated in the link you gave, this is my last
step, but before that i need my solrconfig.xml and schema.xml uploaded via
java in solr. And this is where i'm stuck.


*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:01 GMT+02:00 Alexandre Rafalovitch arafa...@gmail.com:

 On Fri, Jun 20, 2014 at 9:46 PM, Frederic Esnault fesna...@serenzia.com
 wrote:
  Actually i tried looking for a way to use SolrJ to upload my files, but i
  cannot find anywhere informations about how to create nodes with their
  config files using SolrJ.

 Is this something solvable with configsets?
 https://cwiki.apache.org/confluence/display/solr/Config+Sets

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency

Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Floyd Wu

Will these awesome features being implemented in Solr soon
 2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道：

 On Fri, Jun 20, 2014 at 10:15 AM, Yago Riveiro yago.rive...@gmail.com
 wrote:
  Yonik,
 
  This native code uses in any way the docValues?

 Nope... not yet.  It is something I think we should look into in the
 future though.

  In the past I was forced to indexed a big portion of my data with
 docValues enable. OOP problems with large terms dictionaries and GC was my
 main problem.
 
  Other good optimization can be do facet aggregations offsite the heap to
 minimize the GC,

 Yeah, the single-valued string faceting in Heliosearch currently does
 this (the counts array is also off-heap).

  To ensure that facet aggregations has enough ram we need a large heap,
 in machines with a lot of ram maybe if this aggregation was made offsite
 this allow us reduce the heap size.

 Yeah, it's nice not having to worry so much about the correct heap size
 too.

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data

Re: Indexing a term into separate Lucene indexes

On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field is 
 indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different
indexes.  Although it is possible to search more than one separate
index, that's very likely NOT what you want to do, and it comes with its
own challenges.  What you most likely want is to put this data into
different fields within the same index.

You'll need to write custom code to accomplish this, especially if you
need the stored data to contain only the parts rather than the complete
email address.  A copyField can get the data to additional fields, but
I'm not aware of anything built-in to the schema that can trim the
unwanted information from the new fields, and even if there is, any
stored data will be the original data for all three fields.  It's up to
you whether this custom code is in a user application that does your
indexing or in a custom update processor that you load as a plugin to
Solr itself.  Extending whatever user application you are already using
for indexing is very likely to be a lot easier.

Thanks,
Shawn

Re: Question about sending solrconfig and schema files with java

On 6/20/2014 8:46 AM, Frederic Esnault wrote:
 First thank you for taking the time to answer me.

 Actually i tried looking for a way to use SolrJ to upload my files, but i
 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.
 All websites, blogs and docs i found seem to be based on the principle that
 the core already exist or that the config files are already there.

You said that you know how to send the files with curl.  How are you
doing this?  If you can do it with curl, chances are good that you can
duplicate the request with HttpSolrServer in some java code.

Thanks,
Shawn

Re: Question about sending solrconfig and schema files with java

Hi Shawn,

Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
with SolR enabled).
With cURL, i'm doing like this :

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
--data-binary @solrconfig.xml -H 'Content-type:text/xml;
charset=utf-8'

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl 
http://localhost:8983/solr/admin/cores?action=CREATEname=nhanes_ks.nhanes;


Except i'm doing this not on localhost but a remote server, and with
files generated in my java program (which are correct once generated,
i checked).

Using HttpComponents to send them does not work, it adds weird things
before the file (read from the cassandra blob after insert).

Using SolrJ to create the core does not work (cannot upload files, so
it's complaining about missing files).

Using a ContentStream request fails with an internal server error (no details)

HttpSolrServer server = new HttpSolrServer(solrUrl);

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
datasetName + /);

req.addContentStream(new ContentStreamBase.FileStream(new
File(./target/classes/solrconfig.xml)));

server.request(req);

server.commit();

*returned non ok status:500, message:Internal Server Error*



*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 6/20/2014 8:46 AM, Frederic Esnault wrote:
  First thank you for taking the time to answer me.
 
  Actually i tried looking for a way to use SolrJ to upload my files, but i
  cannot find anywhere informations about how to create nodes with their
  config files using SolrJ.
  All websites, blogs and docs i found seem to be based on the principle
 that
  the core already exist or that the config files are already there.

 You said that you know how to send the files with curl.  How are you
 doing this?  If you can do it with curl, chances are good that you can
 duplicate the request with HttpSolrServer in some java code.

 Thanks,
 Shawn

RE: Indexing a term into separate Lucene indexes

2014-06-20 Thread Huang, Roger

Shawn,
Thanks for your response.
Due to security requirements, I do need the name and domain parts of the email 
address stored in separate Lucene indexes.
How do you recommend doing this?  What are the challenges?
Once the name and domain parts of the email address are in different Lucene 
indexes, would I need to modify my  Solr search string?
Thanks,
Roger


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, June 20, 2014 10:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing a term into separate Lucene indexes

On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field 
 is indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different 
indexes.  Although it is possible to search more than one separate index, 
that's very likely NOT what you want to do, and it comes with its own 
challenges.  What you most likely want is to put this data into different 
fields within the same index.

You'll need to write custom code to accomplish this, especially if you need the 
stored data to contain only the parts rather than the complete email address.  
A copyField can get the data to additional fields, but I'm not aware of 
anything built-in to the schema that can trim the unwanted information from the 
new fields, and even if there is, any stored data will be the original data for 
all three fields.  It's up to you whether this custom code is in a user 
application that does your indexing or in a custom update processor that you 
load as a plugin to Solr itself.  Extending whatever user application you are 
already using for indexing is very likely to be a lot easier.

Thanks,
Shawn

Re: [ANN] Heliosearch 0.06 released, native code faceting

On Fri, Jun 20, 2014 at 11:16 AM, Floyd Wu floyd...@gmail.com wrote:
 Will these awesome features being implemented in Solr soon
  2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道：

Given the current makeup of the joint Lucene/Solr PMC, it's unclear.
I'm not worrying about that for now, and just pushing Heliosearch as
far and as fast as I can.
Come join us if you'd like to help!

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Re: Indexing a term into separate Lucene indexes

On 6/20/2014 10:04 AM, Huang, Roger wrote:
 Due to security requirements, I do need the name and domain parts of the 
 email address stored in separate Lucene indexes.
 How do you recommend doing this?  What are the challenges?
 Once the name and domain parts of the email address are in different Lucene 
 indexes, would I need to modify my  Solr search string?

Solr works best if all the data for an individual document is contained
in a single flat schema.  As soon as you try to put some of the data in
one index and some of the data in another index, you'll probably run
into problems combining the data and/or problems with performance.  Solr
does have some join capability, but when it is mentioned, usually it is
to discuss the things it CAN'T do, not the things that it can do.

What kind of security requirement would necessitate splitting data that
logically belongs together?

Thanks,
Shawn

Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Floyd Wu

Hi Yonik, i dont' understand the relationship between solr and heliosearch
since you were committer of solr?

I just curious.
2014/6/21 上午12:07 於 Yonik Seeley yo...@heliosearch.com 寫道：

 On Fri, Jun 20, 2014 at 11:16 AM, Floyd Wu floyd...@gmail.com wrote:
  Will these awesome features being implemented in Solr soon
   2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道：

 Given the current makeup of the joint Lucene/Solr PMC, it's unclear.
 I'm not worrying about that for now, and just pushing Heliosearch as
 far and as fast as I can.
 Come join us if you'd like to help!

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data

Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

On Fri, Jun 20, 2014 at 12:36 PM, Floyd Wu floyd...@gmail.com wrote:
 Hi Yonik, i dont' understand the relationship between solr and heliosearch
 since you were committer of solr?

Heliosearch is a Solr fork that will hopefully find it's way back to
the ASF in the future.

Here's the original project announcement:
http://heliosearch.org/heliosearch-solr-evolved/

And the project FAQ:
http://heliosearch.org/heliosearch-faq/

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

RE: running Post jar from different server

Hi Sameer, Thanks for looking the post. Below are the two variables read from 
the xml file in my tool.

add key=JavaPath value=%JAVA_HOME%\bin\java.exe /
  add key=JavaArgument value= -Xms128m -Xmx256m 
-Durl=http://localhost:8983/solr/{0}/update -jar F:/DataDump/Tools/post.jar /

In commandline it is something like

C:\DataImport\bin\java.exe -Xms128m -Xmx256m 
-Durl=http://localhost:8983/solr/DataCollection/update -jar 
F:/DataDump/Tools/post.jar  F:/DatFiles/*.xml

F:\ is the network drive.

Thanks
Ravi

-Original Message-
From: Sameer Maggon [mailto:sam...@measuredsearch.com] 
Sent: Thursday, June 19, 2014 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: running Post jar from different server

Ravi,

post.jar is a standalone utility that does not have to be on the same server. 
If you can share the command you are executing, there might be some pointers in 
there.

Thanks,
--
*Sameer Maggon*
http://measuredsearch.com


On Thu, Jun 19, 2014 at 8:54 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi,  I have situation where my SQL Job initiate a console application ,
 where I am calling the post.jar to upload data to SOLR. Both SQL DB and
 SOLR are 2 different servers.

 I am calling post.jar from my SQLDB where the path is mapped to a network
 drive. I am getting an error file not found.

 Is the above scenario is possible, if anyone has some experience on this
 can you share or any direction will be really appreciated.

 Thanks

 Ravi

Re: Indexing a term into separate Lucene indexes

On 6/20/2014 12:17 PM, Huang, Roger wrote:
 How would you recommend storing the name and domain parts of the email 
 address in separate Lucene indexes?
 To query, would I use the Solr cross-core join, fromIndex, toIndex?

I have absolutely no idea how to use Solr's join functionality.  It is
not required for my indexes.  Here's the wiki page on the subject:

https://wiki.apache.org/solr/Join

Additional note: Your reply did not come to the mailing list, it was
only sent to me.

Thanks,
Shawn

Discuss moving nextCursorMark to the beginning of response

2014-06-20 Thread Joseph Andaverde

I'd like to discuss moving the nextCursorMark to the beginning of a query
response. This way one can fetch another result set before completely
downloading the response. Currently, it's placed into the SOLR response
last. I figure this is just coincidence because it's a recent addition to
SOLR.

Re: Solr alternates returning different versions of the same document

2014-06-20 Thread Erick Erickson

If you update to a specific core, I suspect you're getting the doc
indexed on two shards which leads to duplicate documents being
returned. So it depends on which core happens to answer the request...
Fundamentally, all versions of a document must go to the same shard in
order for the new version to replace the old version. If you've put
the document specifically on a single node, you've bypassed the
automatic routing that would insure this...

I think the Admin UI kind of side-steps the usual routing process, but
I'm not entirely sure.

Best,
Erick

On Fri, Jun 20, 2014 at 12:47 AM, yann yannick.lallem...@gmail.com wrote:
I have the following problem with Solr 4.5.1, with a cloud install with 4
shards, no replication, using the built-in zookeeper on one Solr:

I have updated a document via the Solr console (select a core, then select
Documents). I used the CSV format to upload the document, including the
document ID.

When I query the document id from the Solr console (simple query:
id:the-id-of-the-doc-I-updated), I alternatively obtain the old document
(with the values before update, and a given _version_ number), or the new
document (with the values after update, and a different _version_).

No log messages in the Solr console about updating the document or anything.

Any idea what might be going on, and how to fix that problem?

Thanks in advance,

Yann

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-alternates-returning-different-versions-of-the-same-document-tp4143006.html
Sent from the Solr - User mailing list archive at Nabble.com.

Undeletable phantom collection / core

2014-06-20 Thread John Smodic

Hi,

I have the following situation using SolrCloud:

deleteCollection foo - Could not find collection:foo

createCollection foo - Error CREATEing SolrCore 'foo_shard1_replica1': 
Could not create a new core in solr/foo_shard1_replica1/as another core is 
already defined there

unload Corefoo_shard1_replica1, delete index, delete dir - No such core 
exists 'foo_shard1_replica1'

My clusterstate.json is empty:

 get /clusterstate.json
{}

However, the /solr directory of my server does have the directory 
foo_shard1_replica1

How can I delete this phantom core / collection without manually deleting 
the directory and restarting my servers?

Thanks!

Re: Undeletable phantom collection / core

On 6/20/2014 1:24 PM, John Smodic wrote:
 I have the following situation using SolrCloud:

 deleteCollection foo - Could not find collection:foo

 createCollection foo - Error CREATEing SolrCore 'foo_shard1_replica1': 
 Could not create a new core in solr/foo_shard1_replica1/as another core is 
 already defined there

 unload Corefoo_shard1_replica1, delete index, delete dir - No such core 
 exists 'foo_shard1_replica1'

 My clusterstate.json is empty:

  get /clusterstate.json
 {}

 However, the /solr directory of my server does have the directory 
 foo_shard1_replica1

 How can I delete this phantom core / collection without manually deleting 
 the directory and restarting my servers?

If the zookeeper database has no mention at all of the foo collection,
then it should be completely safe to just delete or rename the
directory, and you probably won't even need to restart Solr.

Because the core directory most likely does not have a conf directory,
you can't just CREATE and then UNLOAD the core with the
deleteInstanceDir option.  What you MIGHT be able to do for deleting it
with HTTP calls is this:

Temporarily create a new collection with a different name that has one
shard, with  being the name of an existing configuration stored in
zookeeper, ideally whichever config was being used for foo:
http://server:port/solr/admin/collections?action=CREATEname=barnumShards=1collection.configName=

Use CoreAdmin to create the foo_shard1_replica1 core as a replica of the
shard in the new collection:
http://server:port/solr/admin/cores?action=CREATEname=foo_shard1_replica1collection=barshard=shard1

If this CoreAdmin action works, then you can delete the new collection
entirely:
http://server:port/solr/admin/collections?action=DELETEname=bar

I have no idea whether this will actually work, but it's the best idea
that I have.

Thanks,
Shawn

Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread T. Kuro Kurosaka


On 06/20/2014 04:04 AM, Allison, Timothy B. wrote:

Let's say a predominantly English document contains a Chinese sentence.  If the 
English field uses the WhitespaceTokenizer with a basic WordDelimiterFilter, 
the Chinese sentence could be tokenized as one big token (if it doesn't have 
any punctuation, of course) and will be effectively unsearchable...barring use 
of wildcards.


In my experiment with Solr 4.6.1, both StandardTokenizer and ICUTokenizer
generates a token per han character. So they are searcheable though
precision suffers. But in your scenario, Chinese text is rare, so some 
precision

loss may not be a real issue.

Kuro

Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Jack Krupansky

Please post this issue on StackOverflow and one of us DataStax guys will 
deal with it there, since nobody here would know much about the specialized 
way that DataStax uses for dynamic schema and config loading.


Check your DSE server log for the 500 exception - but post it on SO since it 
is probably not Solr-related.


Sorry for the inconvenience!

-- Jack Krupansky

-Original Message- 
From: Frederic Esnault

Sent: Friday, June 20, 2014 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about sending solrconfig and schema files with java

Hi Shawn,

Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
with SolR enabled).
With cURL, i'm doing like this :

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
--data-binary @solrconfig.xml -H 'Content-type:text/xml;
charset=utf-8'

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl 
http://localhost:8983/solr/admin/cores?action=CREATEname=nhanes_ks.nhanes;



Except i'm doing this not on localhost but a remote server, and with
files generated in my java program (which are correct once generated,
i checked).

Using HttpComponents to send them does not work, it adds weird things
before the file (read from the cassandra blob after insert).

Using SolrJ to create the core does not work (cannot upload files, so
it's complaining about missing files).

Using a ContentStream request fails with an internal server error (no 
details)


   HttpSolrServer server = new HttpSolrServer(solrUrl);

   ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
datasetName + /);

   req.addContentStream(new ContentStreamBase.FileStream(new
File(./target/classes/solrconfig.xml)));

   server.request(req);

   server.commit();

*returned non ok status:500, message:Internal Server Error*



*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:


On 6/20/2014 8:46 AM, Frederic Esnault wrote:
 First thank you for taking the time to answer me.

 Actually i tried looking for a way to use SolrJ to upload my files, but 
 i

 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.
 All websites, blogs and docs i found seem to be based on the principle
that
 the core already exist or that the config files are already there.

You said that you know how to send the files with curl.  How are you
doing this?  If you can do it with curl, chances are good that you can
duplicate the request with HttpSolrServer in some java code.

Thanks,
Shawn

Re: Question about sending solrconfig and schema files with java