Surround query with Boolean queries

2014-06-20 Thread Shyamsunder R Mutcha


Hi,

I have two fields in the index with company and year. Following surround query 
finds computer and applications within and 5 words of each is working fine with 
surround query parser.
{!surround maxBasicQueries=10}company:5N(comput*, appli*)

Now If I have add another boolean query +year:[2005 TO *], then it throws query 
parser exception.
{!surround maxBasicQueries=10}company:5N(comput*, appli*) +year:[2005 TO *]

* msg: org.apache.solr.search.SyntaxError: 
org.apache.lucene.queryparser.surround.parser.ParseException: Encountered  
TERM year  at line 1, column 30. Was expecting one of: EOF OR ... 
AND ... NOT ... W ... N ... ^ ... ,
* 

Couldn't figure out the syntax from SurroundQParserPlugin code. 
How to combine other term and/or boolean queries with surround queries. Also 
looking for syntax to add more than one surround query on different fields.

Thanks
Shyamsunder

Re: faceting performance on fields with high-cardinality

2014-06-20 Thread Shyamsunder R Mutcha
Hi Tag,

I dont' see any query(q) given for execution in the firstSearcher and 
newSearcher event listener. Can you add a query term:
str name=qquery term here/str

Check your logs and it will log that firstSeacher event executed and prints an 
message with investerdIndex and number of facet items loaded.


Thanks
Shyamsunder 



On Friday, June 13, 2014 8:02 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:
 


Hi Toke,

Thank you for the reply!

Both single-value-with-semi-colon-tokenizer and multi-value-untokenized
have static warming queries in place.  In fact, that was the first thing I
did to improve performance.

Below is my warming queries in solrconfig.xml.

listener event=newSearcher class=solr.QuerySenderListener
            arr name=queries
                lst !-- begin: static warming for facets --
                    str name=facet.fieldau_facet/str
                    str name=facet.fieldper_facet/str
                    str name=facet.fieldorg_facet/str
str name=facet.fielddt/str
                    str name=facet.fieldbrd/str
                    str name=facet.pivotindustry,source_facet/str
                    str
name=facet.pivotavailability,availability_status/str
                    str name=qtsearch/str
                    str name=facettrue/str
                    str name=f.au_facet.facet.limit5/str
str name=f.per_facet.facet.limit5/str
                    str name=f.org_facet.facet.limit5/str
                    str name=f.dg.facet.limit5/str
                    str name=f.dt.facet.limit5/str
                /lst !-- end: static warming for facets --
            /arr
        /listener
        listener event=firstSearcher class=solr.QuerySenderListener
            arr name=queries
                lst !-- begin: static warming for facets --
                    str name=facet.fieldau_facet/str
                    str name=facet.fieldper_facet/str
                    str name=facet.fieldorg_facet/str
str name=facet.fielddt/str
                    str name=facet.fieldbrd/str
                    str name=facet.pivotindustry,source_facet/str
                    str
name=facet.pivotavailability,availability_status/str
                    str name=qtsearch/str
                    str name=facettrue/str
                    str name=f.au_facet.facet.limit5/str
                    str name=f.per_facet.facet.limit5/str
                    str name=f.org_facet.facet.limit5/str
                    str name=f.dg.facet.limit5/str
                    str name=f.dt.facet.limit5/str
                /lst !-- end: static warming for facets --
            /arr
        /listener


As for cardinality, for example, the per_facet field (person facet) has
4,627,056 unique terms for 14,000,000 documents.

Maybe my warming queries are not correct?  I just don't get why
multi-valued-untokenized field yielded such a performance improvement. I
guess it doesn't make sense to you either :)

I will definitely give the docValues a try to see if it further improves
the performance.


Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library legacy.library.ucsf.edu/
E: rebecca.t...@ucsf.edu





On 6/13/14 1:24 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:

Tang, Rebecca [rebecca.t...@ucsf.edu] wrote:
 I have an solr index with 14+ million records.  We facet on quite a few
fields with very
 high-cardinality such as author, person, organization, brand and
document type.  Some
 of the records contain thousands of persons and organizations.  So the
person and
 organization fields can be very large.

How many unique values per field in the full index are we talking? Just
approximately.

 After this change, the performance improved drastically. But I can't
understand why
 building these fields as multi-valued field vs. single-valued field
with semicolon
 tokenizer can have such a dramatic performance difference.

It should not. I suspect something else is happening. 10 minutes does not
sound unrealistic if it is your first query after and index update. Maybe
your measurement for tokenized was unwarmed and your measurement for
un-tokenized warmed? Could you give an example of a full query?

Anyway, you should definitely be using DocValues for such high
cardinality facet-fields.

Depending on your usage pattern and where the bottleneck is,
https://issues.apache.org/jira/browse/SOLR-5894 might also help.

- Toke Eskildsen

Re: Multivalue wild card search

2014-06-20 Thread Ahmet Arslan
Hi,

What are these square brackets, back slashes, quotes?
Are they part of JSON output? Can you paste human reman able XML response 
writer output?

Thanks,
Ahmet



On Friday, June 20, 2014 12:17 AM, Ethan eh198...@gmail.com wrote:
Ahmet,

Assuming there is a multiValued field called Name of type string stored
in index -

//Doc 1
id : 23512
HotelId : [
    12,
    23,
    12
]
Name : [
[[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]],
[],
[[\hifte\, \Grop\, \\]]
]

// Doc 2

id : 23513
HotelId : [
    12,
    12
]
Name : [
[[\Ethan\, \G\, \\],[\Steve\, \\, \\]],
[],
]

Here, how do I find the document with Name that contains Steve Wonder?

I tried q=***[\Steve\, \Wonder\, \\]] but that doesn't work.




On Fri, Jun 6, 2014 at 11:10 AM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Ethan,


 It is hard to understand your example. Can you re-write it? Using xml?



 On Friday, June 6, 2014 9:07 PM, Ethan eh198...@gmail.com wrote:
 Bumping the thread to see if anyone has a solution.





 On Thu, Jun 5, 2014 at 9:52 AM, Ethan eh198...@gmail.com wrote:

  Wildcard search do work on multiValued field.  I was able to pull up
  records for following multiValued field -
 
  Code : [
  12344,
  4534,
  674
  ]
 
  q=Code:45* fetched the correct document.  It doesn't work in
  quotes(q=Code:45*), however.  Is there a workaround?
 
 
  On Thu, Jun 5, 2014 at 9:34 AM, Ethan eh198...@gmail.com wrote:
 
  Are you implying there is not way to lookup on a multiValued field with
 a
  substring?  If so, then how is it usually handled?
 
 
  On Wed, Jun 4, 2014 at 4:44 PM, Jack Krupansky j...@basetechnology.com
 
  wrote:
 
  Wildcard, fuzzy, and regex query operate on a single term of a single
  tokenized field value or a single string field value.
 
  -- Jack Krupansky
 
  -Original Message- From: Ethan
  Sent: Wednesday, June 4, 2014 6:59 PM
  To: solr-user
  Subject: Multivalue wild card search
 
 
  I can't seem to find a solution to do wild card search on a multiValued
  field.
 
  For Eg consider a multiValued field called Name with 3 values -
 
  Name : [
  [[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]],
  [],
  [[\hifte\, \Grop\, \\]]
  ]
 
  For a multiValued like above, I want search like-
 
  q=***[\Steve\, \Wonder\, \\]
 
 
  But I do not get back any results back. Any ideas on to create such
  query?
 
 
 
 





Re: Surround query with Boolean queries

2014-06-20 Thread Ahmet Arslan
Hello,

special field name _query_ is your friend.

+_query_:{!surround maxBasicQueries=10}company:5N(comput*, appli*) 
+_query_:{!lucene}year:[2005 TO *]

http://searchhub.org/2009/03/31/nested-queries-in-solr/

Ahmet


On Friday, June 20, 2014 9:39 AM, Shyamsunder R Mutcha 
sjh...@yahoo.com.INVALID wrote:


Hi,

I have two fields in the index with company and year. Following surround query 
finds computer and applications within and 5 words of each is working fine with 
surround query parser.
{!surround maxBasicQueries=10}company:5N(comput*, appli*)

Now If I have add another boolean query +year:[2005 TO *], then it throws query 
parser exception.
{!surround maxBasicQueries=10}company:5N(comput*, appli*) +year:[2005 TO *]

    * msg: org.apache.solr.search.SyntaxError: 
org.apache.lucene.queryparser.surround.parser.ParseException: Encountered  
TERM year  at line 1, column 30. Was expecting one of: EOF OR ... 
AND ... NOT ... W ... N ... ^ ... ,
    * 

Couldn't figure out the syntax from SurroundQParserPlugin code. 
How to combine other term and/or boolean queries with surround queries. Also 
looking for syntax to add more than one surround query on different fields.

Thanks
Shyamsunder


unable to start DataimportHandler

2014-06-20 Thread atp
Hi Experts,

I have configured solrcloud4.8 with zookeeper and tomcat , this is 3 node
cluster configuration, we have a requirement that , searching the table data
which is stored in hbase tables, for this i have configured below setup,

1. Edit the solrconfig.xml and added contriblib and dataimporthandler libs. 
2. created new data-config.xml file with hbase connectivity and table
details in ./collection1/conf directory.
3. added the request handler in solrconfig.xml file.
4. restarted the tomcat serlet container, 

but its not reflecting in solr, i tried datahandler/full import but it say
sorry no dataimporthandler defined 


1. can you please guide , is there any other steps required to define the
dataimporthandler for hbase.?

2. also i have done the steps in all the 3 nodes, stil its not taking. plz
help.

Thanks in Advance
Annamalai, 








--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-start-DataimportHandler-tp4142989.html
Sent from the Solr - User mailing list archive at Nabble.com.


About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi,

I think this might be a silly question but i want to make it clear.

What is query parser...? What does it do.? I know its used for converting
query. But from What to what?what is the input and what is the output of
query parser. And where exactly this feature can be used?

If possible please explain with the example. It really helps a lot?

Thanks,
Vivek


Solr alternates returning different versions of the same document

2014-06-20 Thread yann
I have the following problem with Solr 4.5.1, with a cloud install with 4
shards, no replication, using the built-in zookeeper on one Solr:

I have updated a document via the Solr console (select a core, then select
Documents). I used the CSV format to upload the document, including the
document ID.

When I query the document id from the Solr console (simple query:
id:the-id-of-the-doc-I-updated), I alternatively obtain the old document
(with the values before update, and a given _version_ number), or the new
document (with the values after update, and a different _version_).

No log messages in the Solr console about updating the document or anything.

Any idea what might be going on, and how to fix that problem?

Thanks in advance,

Yann



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-alternates-returning-different-versions-of-the-same-document-tp4143006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Exception: org.apache.solr.common.SolrException: Fallo en lectura de Conector (connector reading failure)

2014-06-20 Thread david . davila
Hello,

we have a Solr Cloud 4.7 with 2 shards with 2 nodes each one.
Until now it was working fine, but since yesterday we have this error in 
almost all the updates:



org.apache.solr.common.SolrException: Fallo en lectura de Conector
 at 
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
 at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
org.apache.logging.log4j.core.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:66)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at 
org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:197)
 at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603)
 at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2430)
 at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2419)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
 at java.lang.Thread.run(Thread.java:804)
Caused by: com.ctc.wstx.exc.WstxIOException: Fallo en lectura de Conector
 at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:548)
 at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
 at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:629)
 at 
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:324)
 at 
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:172)
 ... 25 more
Caused by: java.io.IOException: Fallo en lectura de Conector
 at 
org.apache.coyote.ajp.AjpAprProcessor.read(AjpAprProcessor.java:328)
 at 
org.apache.coyote.ajp.AjpAprProcessor.readMessage(AjpAprProcessor.java:424)
 at 
org.apache.coyote.ajp.AjpAprProcessor.receive(AjpAprProcessor.java:383)
 at 
org.apache.coyote.ajp.AbstractAjpProcessor$SocketInputBuffer.doRead(AbstractAjpProcessor.java:1131)
 at org.apache.coyote.Request.doRead(Request.java:422)
 at 
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
 at 
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:449)
 at 
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)
 at 
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:167)
 at 
com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
 at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
 at 
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
 at 
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
 at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
 ... 29 more

What are these type of exceptions due to? We haven't changed anything.

Thank you very much,

David Dávila Atienza
AEAT - Departamento de 

Re: About Query Parser

2014-06-20 Thread Alexandre Rafalovitch
I am going to have a go at this. Maybe others can add/correct.

When you make a request to Solr, it hits a request handler first. E.g.
a /select request handler. That's defined in solrconfig.xml

The request handler can change your request with some defaults,
required and overriding parameters.

For solr.SearchHandler, it can also define what search components
stack then processes the actual request. They can define it explicitly
(e.g. /suggest request handler),  use default stack or
append/prepend to the default stack (e.g. /spell request Handler).

The default search component stack can be seen in the commented out
section of solrconfig.xml and consists of 6 components: query, facet,
mlt (MoreLikeThis), highlight, stats, and debug.

Query component is the one that actually does the searching and
figuring out what the result documents are. And it uses query parsers
for that. There are multiple query parsers available. The most common
are standard/lucene, dismax and edismax. But there is a bunch
more: https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing

If you don't have query components, you are not actually searching for
documents, you are doing something else (e.g. spelling).

These parsers transform what you sent in your URL (in the q
parameter, but also others) into the Lucene or internal queries that
return documents with some ranking attached.

Then, other components do their own things too. facet components add
facets. highlight components add highlight sections based on the
already collected information and so on.

Then, all that gets serialized into one of many supported formats
(XML, JSON, Ruby, etc) and sent back to the client.

If you want examples, then just read through solrconfig.xml and
shema.xml and understand how they hang together. That's why they are
so long, so people can see the defaults and examples. If you did not
care for that, your solrconfig.xml could be as small as:
https://github.com/arafalov/solr-indexing-book/blob/master/published/collection1/conf/solrconfig.xml

Regards,
   Alex.
P.s. The interesting question in return is where are you stuck that
you think that knowing what query parser is will move you further
ahead?
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Jun 20, 2014 at 3:55 PM, Vivekanand Ittigi
vi...@biginfolabs.com wrote:
 Hi,

 I think this might be a silly question but i want to make it clear.

 What is query parser...? What does it do.? I know its used for converting
 query. But from What to what?what is the input and what is the output of
 query parser. And where exactly this feature can be used?

 If possible please explain with the example. It really helps a lot?

 Thanks,
 Vivek


Re: About Query Parser

2014-06-20 Thread Daniel Collins
Alexandre's response is very thorough, so I'm really simplifying things, I
confess but here's my query parsers for dummies. :)

In terms of inputs/outputs, a QueryParser takes a string (generally assumed
to be human generated i.e. something a user might type in, so maybe a
sentence, a set of words, the format can vary) and outputs a Lucene Query
object (
http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html),
which in fact is a kind of tree (again, I'm simplifying I know) since a
query can contain nested expressions.

So very loosely its a translator from a human-generated query into the
structure that Lucene can handle.  There are several different query
parsers since they all use different input syntax, and ways of handling
different constructs (to handle A and B, should the user type +A +B or A
and B or just A B for example), and have different levels of support for
the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
PhraseQuery, etc.

We for example use an XML-based query parser.  Why (you might well ask!),
well we had an already used and supported query syntax of our own, which
our users understood, so we couldn't use an off the shelf query parser.  We
could have built our own in Java, but for a variety of reasons we parse our
queries in a front-end system ahead of Solr (which is C++-based), so we
needed an interim format to pass queries to Solr that was as near to a
Lucene Query object as we could get (and there was an existing XML parser
to save us starting from square one!).

As part of that Query construction (but independent of which QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters (
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters)
but that's more to do with dealing with the terms in the query (so in my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).


RE: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread Allison, Timothy B.
Alex,
  Thank you for the quick response.  Apologies for my delay.
Y, we'll use edismax.  That won't solve the issue of multilingual documents...I 
don't think...unless we index every document as every language.
Let's say a predominantly English document contains a Chinese sentence.  If the 
English field uses the WhitespaceTokenizer with a basic WordDelimiterFilter, 
the Chinese sentence could be tokenized as one big token (if it doesn't have 
any punctuation, of course) and will be effectively unsearchable...barring use 
of wildcards.
So, what we're looking for is a basic, reliable-ish field configuration to 
handle all languages as a fallback.  So we were thinking, perhaps, ICUTokenizer 
with ICUFoldingFilter and perhaps a multilingual stopword list.
We do want the language specific handling for most cases, and the basic 
langid+field per language setup with edismax will get us that.  Any thoughts?

Thank you, again.

   Best,

   Tim

I don't think the text_all field would work too well for multilingual
setup. Any reason you cannot use edismax to search over a bunch of
fields instead?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency




From: Allison, Timothy B.
Sent: Wednesday, June 18, 2014 9:31 PM
To: solr-user@lucene.apache.org
Subject: ICUTokenizer or StandardTokenizer or ??? for text_all type field 
that might include non-whitespace langs

All,

In one index I’m working with, the setup is the typical langid mapping to 
language specific fields.  There is also a text_all field that everything is 
copied to.  The documents can contain a wide variety of languages including 
non-whitespace languages.  We’ll be using the ICUTokenFilter in the analysis 
chain, but what should we use for the tokenizer for the “text_all” field?  My 
inclination is to go with the ICUTokenizer.  Are there any reasons to prefer 
the StandardTokenizer or another tokenizer for this field?

Thank you.

   Best,

  Tim


Question about sending solrconfig and schema files with java

2014-06-20 Thread Frederic Esnault
Hi,
I know how to send solrconfig.xml and schema.xml files to SolR using curl
commands.
But my problem is that i want to send them with java, and i can't find a
way to do so.
I used HttpComponentsand got http headers before the file begins, which SAX
parser does not like at all.

What is the best way to send this files from a java program ?

What i have once i sent the file is something like that :

*��:
solr_admin solr_resources resource_value��--9NDJNu2AW4jtIyX6ggQAgEqI3FXp3JpDZ6
Content-Disposition: form-data; name=solrconfig.xml; filename=solrconfig.xml
Content-Type: application/xml; charset=ISO-8859-1
Content-Transfer-Encoding:** binary*
config!-- In all configuration below, a prefix of solr. for class names
 is an alias that causes solr to search appropriate packages,
 including org.apache.solr.(search|update|request|core|analysis)
[Continued...]


Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi Daniel,

You said inputs are human-generated and outputs are lucene objects. So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek



On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Alexandre's response is very thorough, so I'm really simplifying things, I
 confess but here's my query parsers for dummies. :)

 In terms of inputs/outputs, a QueryParser takes a string (generally assumed
 to be human generated i.e. something a user might type in, so maybe a
 sentence, a set of words, the format can vary) and outputs a Lucene Query
 object (

 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
 ),
 which in fact is a kind of tree (again, I'm simplifying I know) since a
 query can contain nested expressions.

 So very loosely its a translator from a human-generated query into the
 structure that Lucene can handle.  There are several different query
 parsers since they all use different input syntax, and ways of handling
 different constructs (to handle A and B, should the user type +A +B or A
 and B or just A B for example), and have different levels of support for
 the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
 PhraseQuery, etc.

 We for example use an XML-based query parser.  Why (you might well ask!),
 well we had an already used and supported query syntax of our own, which
 our users understood, so we couldn't use an off the shelf query parser.  We
 could have built our own in Java, but for a variety of reasons we parse our
 queries in a front-end system ahead of Solr (which is C++-based), so we
 needed an interim format to pass queries to Solr that was as near to a
 Lucene Query object as we could get (and there was an existing XML parser
 to save us starting from square one!).

 As part of that Query construction (but independent of which QueryParser
 you use), Solr will also make use of a set of Tokenizers and Filters (

 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
 )
 but that's more to do with dealing with the terms in the query (so in my
 examples above, is A a real word, does it need stemming, lowercasing,
 removing because its a stopword, etc).



Re: About Query Parser

2014-06-20 Thread Alexandre Rafalovitch
That's *:* and a special case. There is no scoring here, nor searching.
Just a dump of documents. Not even filtering or faceting. I sure hope you
have more interesting examples.

Regards,
Alex
On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote:

 Hi Daniel,

 You said inputs are human-generated and outputs are lucene objects. So
 my question is what does the below query mean. Does this fall under
 human-generated one or lucene.?

 http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

 Thanks,
 Vivek



 On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
 wrote:

  Alexandre's response is very thorough, so I'm really simplifying things,
 I
  confess but here's my query parsers for dummies. :)
 
  In terms of inputs/outputs, a QueryParser takes a string (generally
 assumed
  to be human generated i.e. something a user might type in, so maybe a
  sentence, a set of words, the format can vary) and outputs a Lucene Query
  object (
 
 
 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
  ),
  which in fact is a kind of tree (again, I'm simplifying I know) since a
  query can contain nested expressions.
 
  So very loosely its a translator from a human-generated query into the
  structure that Lucene can handle.  There are several different query
  parsers since they all use different input syntax, and ways of handling
  different constructs (to handle A and B, should the user type +A +B or
 A
  and B or just A B for example), and have different levels of support
 for
  the various Query structures that Lucene can handle: SpanQuery,
 FuzzyQuery,
  PhraseQuery, etc.
 
  We for example use an XML-based query parser.  Why (you might well ask!),
  well we had an already used and supported query syntax of our own, which
  our users understood, so we couldn't use an off the shelf query parser.
  We
  could have built our own in Java, but for a variety of reasons we parse
 our
  queries in a front-end system ahead of Solr (which is C++-based), so we
  needed an interim format to pass queries to Solr that was as near to a
  Lucene Query object as we could get (and there was an existing XML parser
  to save us starting from square one!).
 
  As part of that Query construction (but independent of which QueryParser
  you use), Solr will also make use of a set of Tokenizers and Filters (
 
 
 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
  )
  but that's more to do with dealing with the terms in the query (so in my
  examples above, is A a real word, does it need stemming, lowercasing,
  removing because its a stopword, etc).
 



Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
All right let me put this.

http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
.

I just want to know what is this form. is it lucene query or this query
should go under query parser to get converted to lucene query.


Thanks,
Vivek


On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 That's *:* and a special case. There is no scoring here, nor searching.
 Just a dump of documents. Not even filtering or faceting. I sure hope you
 have more interesting examples.

 Regards,
 Alex
 On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote:

  Hi Daniel,
 
  You said inputs are human-generated and outputs are lucene objects.
 So
  my question is what does the below query mean. Does this fall under
  human-generated one or lucene.?
 
  http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true
 
  Thanks,
  Vivek
 
 
 
  On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
  wrote:
 
   Alexandre's response is very thorough, so I'm really simplifying
 things,
  I
   confess but here's my query parsers for dummies. :)
  
   In terms of inputs/outputs, a QueryParser takes a string (generally
  assumed
   to be human generated i.e. something a user might type in, so maybe a
   sentence, a set of words, the format can vary) and outputs a Lucene
 Query
   object (
  
  
 
 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
   ),
   which in fact is a kind of tree (again, I'm simplifying I know)
 since a
   query can contain nested expressions.
  
   So very loosely its a translator from a human-generated query into the
   structure that Lucene can handle.  There are several different query
   parsers since they all use different input syntax, and ways of handling
   different constructs (to handle A and B, should the user type +A +B
 or
  A
   and B or just A B for example), and have different levels of support
  for
   the various Query structures that Lucene can handle: SpanQuery,
  FuzzyQuery,
   PhraseQuery, etc.
  
   We for example use an XML-based query parser.  Why (you might well
 ask!),
   well we had an already used and supported query syntax of our own,
 which
   our users understood, so we couldn't use an off the shelf query parser.
   We
   could have built our own in Java, but for a variety of reasons we parse
  our
   queries in a front-end system ahead of Solr (which is C++-based), so we
   needed an interim format to pass queries to Solr that was as near to a
   Lucene Query object as we could get (and there was an existing XML
 parser
   to save us starting from square one!).
  
   As part of that Query construction (but independent of which
 QueryParser
   you use), Solr will also make use of a set of Tokenizers and Filters (
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
   )
   but that's more to do with dealing with the terms in the query (so in
 my
   examples above, is A a real word, does it need stemming, lowercasing,
   removing because its a stopword, etc).
  
 



Re: About Query Parser

2014-06-20 Thread Daniel Collins
I would say *:* is a human-readable/writable query. as is
inStock:false.  The former will be converted by the query parser into a
MatchAllDocsQuery which is what Lucene understands.  The latter will be
converted (again by the query parser) into some query.  Now this is where
*which* query parser you are using is important.  Is inStock a word to be
queried, or a field in your schema?  Probably the latter, but the query
parser has to determine that using the Solr schema.  So I would expect that
query to be converted to a TermQuery(Term(inStock, false)), so a query
for the value false in the field inStock.

This is all interesting but what are you really trying to find out?  If you
just want to run queries and see what they translate to, you can use the
debug options when you send the query in, and then Solr will return to you
both the raw query (with any other options that the query handler might
have added to your query) as well as the Lucene Query generated from it.

e.g.from running : on a solr instance.

rawquerystring: *:*, querystring: *:*, parsedquery:
MatchAllDocsQuery(*:*), parsedquery_toString: *:*, QParser:
LuceneQParser,
Or (this shows the difference between raw query syntax and parsed query
syntax) rawquerystring: body_en:test AND headline_en:hello, querystring:
body_en:test AND headline_en:hello, parsedquery: +body_en:test
+headline_en:hello, parsedquery_toString: +body_en:test
+headline_en:hello, QParser: LuceneQParser,


On 20 June 2014 13:05, Vivekanand Ittigi vi...@biginfolabs.com wrote:

 All right let me put this.


 http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
 .

 I just want to know what is this form. is it lucene query or this query
 should go under query parser to get converted to lucene query.


 Thanks,
 Vivek


 On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  That's *:* and a special case. There is no scoring here, nor searching.
  Just a dump of documents. Not even filtering or faceting. I sure hope you
  have more interesting examples.
 
  Regards,
  Alex
  On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:
 
   Hi Daniel,
  
   You said inputs are human-generated and outputs are lucene objects.
  So
   my question is what does the below query mean. Does this fall under
   human-generated one or lucene.?
  
  
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true
  
   Thanks,
   Vivek
  
  
  
   On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
 
   wrote:
  
Alexandre's response is very thorough, so I'm really simplifying
  things,
   I
confess but here's my query parsers for dummies. :)
   
In terms of inputs/outputs, a QueryParser takes a string (generally
   assumed
to be human generated i.e. something a user might type in, so
 maybe a
sentence, a set of words, the format can vary) and outputs a Lucene
  Query
object (
   
   
  
 
 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
),
which in fact is a kind of tree (again, I'm simplifying I know)
  since a
query can contain nested expressions.
   
So very loosely its a translator from a human-generated query into
 the
structure that Lucene can handle.  There are several different query
parsers since they all use different input syntax, and ways of
 handling
different constructs (to handle A and B, should the user type +A +B
  or
   A
and B or just A B for example), and have different levels of
 support
   for
the various Query structures that Lucene can handle: SpanQuery,
   FuzzyQuery,
PhraseQuery, etc.
   
We for example use an XML-based query parser.  Why (you might well
  ask!),
well we had an already used and supported query syntax of our own,
  which
our users understood, so we couldn't use an off the shelf query
 parser.
We
could have built our own in Java, but for a variety of reasons we
 parse
   our
queries in a front-end system ahead of Solr (which is C++-based), so
 we
needed an interim format to pass queries to Solr that was as near to
 a
Lucene Query object as we could get (and there was an existing XML
  parser
to save us starting from square one!).
   
As part of that Query construction (but independent of which
  QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters
 (
   
   
  
 
 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
)
but that's more to do with dealing with the terms in the query (so in
  my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).
   
  
 



Trouble with TrieDateFields

2014-06-20 Thread Jared Whiklo
I am upgrading an index from Solr 3.6 to 4.2.0.

Everything has been picked up except for the old DateFields.

I read some posts that due to the extra functionality of the TrieDateField 
you would need to re-index for those fields.

To avoid re-indexing I was trying to do a Partial Update 
(http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/),

I am doing this with a Python script that does a query, pulls the field 
contents and then reformats it and sends a JSON update back to Solr.

But no matter what I send Solr gives me the same error

SEVERE: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.util.Date
at org.apache.solr.schema.TrieDateField.toObject(TrieDateField.java:70)
at org.apache.solr.schema.TrieDateField.toObject(TrieDateField.java:55)
……

I have tried send the date as a date string to be parsed and as a number of 
milliseconds from or before epoch. Both give the same error.

Any suggestions would be appreciated.

Examples of record attempts.

As seconds
--
2014-06-19 16:02:09,503 - solr_date_fixer - DEBUG - old record - 
{u'timestamp': 
u'ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2013-07-17T18:09:59.049', u'PID': 
u'uofm:1235128'}
2014-06-19 16:02:09,503 - solr_date_fixer - DEBUG - new record - {'timestamp': 
{'set': 1374084599049.0}, 'PID': u'uofm:1235128'}
--

As date
--
2014-06-20 08:11:27,986 - solr_date_fixer - DEBUG - old record - 
{u'timestamp': 
u'ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2013-07-17T18:09:59.049', u'PID': 
u'uofm:1235128'}
2014-06-20 08:11:27,986 - solr_date_fixer - DEBUG - new record - {'timestamp': 
{'set': u'2013-07-17T18:09:59.049Z'}, 'PID': u'uofm:1235128'}
---
--
Jared Whiklo
Developer – Digital Initiatives
University of Manitoba Libraries
v: 204-474-6523
c: 204-228-1943
e: jared_whi...@umanitoba.ca


Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yonik Seeley
On Fri, Jun 20, 2014 at 12:36 AM, Andy angelf...@yahoo.com.invalid wrote:
 Congrats! Any idea when will native faceting  off-heap fieldcache be 
 available for multivalued fields? Most of my fields are multivalued so that's 
 the big one for me.

Hopefully within the next month or so
If anyone wants to help out, the github issue is here:
https://github.com/Heliosearch/heliosearch/issues/13

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data



 On Thursday, June 19, 2014 3:46 PM, Yonik Seeley yo...@heliosearch.com 
 wrote:



 FYI, for those who want to try out the new native code faceting, this
 is the first release containing it (for single valued string fields
 only as of yet).

 http://heliosearch.org/download/

 Heliosearch v0.06

 Features:
 o  Heliosearch v0.06 is based on (and contains all features of)
 Lucene/Solr 4.9.0
 o  Native code faceting for single valued string fields.
 - Written in C++, statically compiled with gcc for Windows, Mac OS-X, 
 Linux
 - static compilation avoids JVM hotspot warmup period,
 mis-compilation bugs, and variations between runs
 - Improves performance over 2x
 o  Top level Off-heap fieldcache for single valued string fields in nCache.
 - Improves sorting and faceting speed
 - Reduces garbage collection overhead
 - Eliminates FieldCache “insanity” that exists in Apache Solr from
 faceting and sorting on the same field
 o  Full request Parameter substitution / macro expansion, including
 default value support.
 o  frange query now only returns documents with a value.
  For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will
 also return documents without a value since the numeric default value
 of 0 lies within the range requested.
 o  New JSON features via Noggit upgrade, allowing optional comments
 (C/C++ and shell style), unquoted keys, and relaxed escaping that
 allows one to backslash escape any character.


 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data


Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Shawn Heisey
On 6/20/2014 5:16 AM, Frederic Esnault wrote:
 I know how to send solrconfig.xml and schema.xml files to SolR using curl
 commands.
 But my problem is that i want to send them with java, and i can't find a
 way to do so.
 I used HttpComponentsand got http headers before the file begins, which SAX
 parser does not like at all.
 
 What is the best way to send this files from a java program ?

Chances are good that you can duplicate your curl requests with
HttpSolrServer and SolrQuery, part of solrj, which is in the Solr
download under the dist directory.

If you are running SolrCloud, then the configs in Zookeeper are directly
accessible with Java code.  You should take a look at the source code,
in ZkController#uploadConfigDir, to see how the uploadToZK methods work.
 You should be able to use the SolrZkClient#makePath method, just like
uploadToZK does.

To use SolrZKClient (or the requests similar to what you do now with
curl), you will need the solrj jar and it's dependencies.  The
recommended versions of those dependencies can be found in the download,
in the dist/solrj-lib directory.  To get the SolrZkClient, you would
need to establish a CloudSolrServer object, then retrieve the
ZkStateReader from the CloudSolrServer, and the SolrZkClient from the
ZkStateReader.

Thanks,
Shawn



Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yago Riveiro
Yonik,

This native code uses in any way the docValues?

In the past I was forced to indexed a big portion of my data with docValues 
enable. OOP problems with large terms dictionaries and GC was my main problem.

Other good optimization can be do facet aggregations offsite the heap to 
minimize the GC, To ensure that facet aggregations has enough ram we need a 
large heap, in machines with a lot of ram maybe if this aggregation was made 
offsite this allow us reduce the heap size.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, June 20, 2014 at 2:33 PM, Yonik Seeley wrote:

 On Fri, Jun 20, 2014 at 12:36 AM, Andy angelf...@yahoo.com.invalid 
 (mailto:angelf...@yahoo.com.invalid) wrote:
  Congrats! Any idea when will native faceting  off-heap fieldcache be 
  available for multivalued fields? Most of my fields are multivalued so 
  that's the big one for me.
  
  
 Hopefully within the next month or so
 If anyone wants to help out, the github issue is here:
 https://github.com/Heliosearch/heliosearch/issues/13
  
 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data
  
  
  
  On Thursday, June 19, 2014 3:46 PM, Yonik Seeley yo...@heliosearch.com 
  (mailto:yo...@heliosearch.com) wrote:
   
   
   
  FYI, for those who want to try out the new native code faceting, this
  is the first release containing it (for single valued string fields
  only as of yet).
   
  http://heliosearch.org/download/
   
  Heliosearch v0.06
   
  Features:
  o Heliosearch v0.06 is based on (and contains all features of)
  Lucene/Solr 4.9.0
  o Native code faceting for single valued string fields.
  - Written in C++, statically compiled with gcc for Windows, Mac OS-X, Linux
  - static compilation avoids JVM hotspot warmup period,
  mis-compilation bugs, and variations between runs
  - Improves performance over 2x
  o Top level Off-heap fieldcache for single valued string fields in nCache.
  - Improves sorting and faceting speed
  - Reduces garbage collection overhead
  - Eliminates FieldCache “insanity” that exists in Apache Solr from
  faceting and sorting on the same field
  o Full request Parameter substitution / macro expansion, including
  default value support.
  o frange query now only returns documents with a value.
  For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will
  also return documents without a value since the numeric default value
  of 0 lies within the range requested.
  o New JSON features via Noggit upgrade, allowing optional comments
  (C/C++ and shell style), unquoted keys, and relaxed escaping that
  allows one to backslash escape any character.
   
   
  -Yonik
  http://heliosearch.org - native code faceting, facet functions,
  sub-facets, off-heap data
   
  
  
  




FW: Indexing a term into separate Lucene indexes

2014-06-20 Thread Huang, Roger

If I have documents with a person and his email address: 
u...@domain.commailto:u...@domain.com

How can I configure Solr (4.6) so that the email address source field is 
indexed as

-  the user part of the address (e.g., user) is in Lucene index X

-  the domain part of the address (e.g., domain.com) is in a separate 
Lucene index Y

I would like to be able search as follows:

-  Find all people whose email addresses have user part = userXyz

-  Find all people whose email addresses have domain part = 
domainABC.com

-  Find the person with exact email address = 
user...@domainabc.commailto:user...@domainabc.com

Would I use a copyField declaration in my schema?
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

Thanks!


Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yonik Seeley
On Fri, Jun 20, 2014 at 10:15 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 Yonik,

 This native code uses in any way the docValues?

Nope... not yet.  It is something I think we should look into in the
future though.

 In the past I was forced to indexed a big portion of my data with docValues 
 enable. OOP problems with large terms dictionaries and GC was my main problem.

 Other good optimization can be do facet aggregations offsite the heap to 
 minimize the GC,

Yeah, the single-valued string faceting in Heliosearch currently does
this (the counts array is also off-heap).

 To ensure that facet aggregations has enough ram we need a large heap, in 
 machines with a lot of ram maybe if this aggregation was made offsite this 
 allow us reduce the heap size.

Yeah, it's nice not having to worry so much about the correct heap size too.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Frederic Esnault
Hi Shawn,

First thank you for taking the time to answer me.

Actually i tried looking for a way to use SolrJ to upload my files, but i
cannot find anywhere informations about how to create nodes with their
config files using SolrJ.
All websites, blogs and docs i found seem to be based on the principle that
the core already exist or that the config files are already there.

I tried using SolrJ anyway, using CoreAdminRequest.create(), but i can only
pass a config file name and a schema file name, not the files themselves,
so i don't see how to do this.
Result of this try is :
INFO: Sending SolR config ...
4226 [AWT-EventQueue-0] INFO
org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http
client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No
resource solrconfig.xml for core solrks.villes_france, did you miss to
upload it?
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:462)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(CoreAdminRequest.java:534)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(CoreAdminRequest.java:514)




*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 15:35 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 6/20/2014 5:16 AM, Frederic Esnault wrote:
  I know how to send solrconfig.xml and schema.xml files to SolR using curl
  commands.
  But my problem is that i want to send them with java, and i can't find a
  way to do so.
  I used HttpComponentsand got http headers before the file begins, which
 SAX
  parser does not like at all.
 
  What is the best way to send this files from a java program ?

 Chances are good that you can duplicate your curl requests with
 HttpSolrServer and SolrQuery, part of solrj, which is in the Solr
 download under the dist directory.

 If you are running SolrCloud, then the configs in Zookeeper are directly
 accessible with Java code.  You should take a look at the source code,
 in ZkController#uploadConfigDir, to see how the uploadToZK methods work.
  You should be able to use the SolrZkClient#makePath method, just like
 uploadToZK does.

 To use SolrZKClient (or the requests similar to what you do now with
 curl), you will need the solrj jar and it's dependencies.  The
 recommended versions of those dependencies can be found in the download,
 in the dist/solrj-lib directory.  To get the SolrZkClient, you would
 need to establish a CloudSolrServer object, then retrieve the
 ZkStateReader from the CloudSolrServer, and the SolrZkClient from the
 ZkStateReader.

 Thanks,
 Shawn




Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Alexandre Rafalovitch
On Fri, Jun 20, 2014 at 9:46 PM, Frederic Esnault fesna...@serenzia.com wrote:
 Actually i tried looking for a way to use SolrJ to upload my files, but i
 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.

Is this something solvable with configsets?
https://cwiki.apache.org/confluence/display/solr/Config+Sets

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Frederic Esnault
Hi Alexandre,

Nope, I cannot access the server (well i can actually, but my users won't
be able to do so), and i can't rely on an http curl call.

As for the final http call indicated in the link you gave, this is my last
step, but before that i need my solrconfig.xml and schema.xml uploaded via
java in solr. And this is where i'm stuck.


*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:01 GMT+02:00 Alexandre Rafalovitch arafa...@gmail.com:

 On Fri, Jun 20, 2014 at 9:46 PM, Frederic Esnault fesna...@serenzia.com
 wrote:
  Actually i tried looking for a way to use SolrJ to upload my files, but i
  cannot find anywhere informations about how to create nodes with their
  config files using SolrJ.

 Is this something solvable with configsets?
 https://cwiki.apache.org/confluence/display/solr/Config+Sets

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency



Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Floyd Wu
Will these awesome features being implemented in Solr soon
 2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道:

 On Fri, Jun 20, 2014 at 10:15 AM, Yago Riveiro yago.rive...@gmail.com
 wrote:
  Yonik,
 
  This native code uses in any way the docValues?

 Nope... not yet.  It is something I think we should look into in the
 future though.

  In the past I was forced to indexed a big portion of my data with
 docValues enable. OOP problems with large terms dictionaries and GC was my
 main problem.
 
  Other good optimization can be do facet aggregations offsite the heap to
 minimize the GC,

 Yeah, the single-valued string faceting in Heliosearch currently does
 this (the counts array is also off-heap).

  To ensure that facet aggregations has enough ram we need a large heap,
 in machines with a lot of ram maybe if this aggregation was made offsite
 this allow us reduce the heap size.

 Yeah, it's nice not having to worry so much about the correct heap size
 too.

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field is 
 indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different
indexes.  Although it is possible to search more than one separate
index, that's very likely NOT what you want to do, and it comes with its
own challenges.  What you most likely want is to put this data into
different fields within the same index.

You'll need to write custom code to accomplish this, especially if you
need the stored data to contain only the parts rather than the complete
email address.  A copyField can get the data to additional fields, but
I'm not aware of anything built-in to the schema that can trim the
unwanted information from the new fields, and even if there is, any
stored data will be the original data for all three fields.  It's up to
you whether this custom code is in a user application that does your
indexing or in a custom update processor that you load as a plugin to
Solr itself.  Extending whatever user application you are already using
for indexing is very likely to be a lot easier.

Thanks,
Shawn



Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Shawn Heisey
On 6/20/2014 8:46 AM, Frederic Esnault wrote:
 First thank you for taking the time to answer me.

 Actually i tried looking for a way to use SolrJ to upload my files, but i
 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.
 All websites, blogs and docs i found seem to be based on the principle that
 the core already exist or that the config files are already there.

You said that you know how to send the files with curl.  How are you
doing this?  If you can do it with curl, chances are good that you can
duplicate the request with HttpSolrServer in some java code.

Thanks,
Shawn



Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Frederic Esnault
Hi Shawn,

Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
with SolR enabled).
With cURL, i'm doing like this :

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
--data-binary @solrconfig.xml -H 'Content-type:text/xml;
charset=utf-8'

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl 
http://localhost:8983/solr/admin/cores?action=CREATEname=nhanes_ks.nhanes;


Except i'm doing this not on localhost but a remote server, and with
files generated in my java program (which are correct once generated,
i checked).

Using HttpComponents to send them does not work, it adds weird things
before the file (read from the cassandra blob after insert).

Using SolrJ to create the core does not work (cannot upload files, so
it's complaining about missing files).

Using a ContentStream request fails with an internal server error (no details)

HttpSolrServer server = new HttpSolrServer(solrUrl);

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
datasetName + /);

req.addContentStream(new ContentStreamBase.FileStream(new
File(./target/classes/solrconfig.xml)));

server.request(req);

server.commit();

*returned non ok status:500, message:Internal Server Error*



*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 6/20/2014 8:46 AM, Frederic Esnault wrote:
  First thank you for taking the time to answer me.
 
  Actually i tried looking for a way to use SolrJ to upload my files, but i
  cannot find anywhere informations about how to create nodes with their
  config files using SolrJ.
  All websites, blogs and docs i found seem to be based on the principle
 that
  the core already exist or that the config files are already there.

 You said that you know how to send the files with curl.  How are you
 doing this?  If you can do it with curl, chances are good that you can
 duplicate the request with HttpSolrServer in some java code.

 Thanks,
 Shawn




RE: Indexing a term into separate Lucene indexes

2014-06-20 Thread Huang, Roger
Shawn,
Thanks for your response.
Due to security requirements, I do need the name and domain parts of the email 
address stored in separate Lucene indexes.
How do you recommend doing this?  What are the challenges?
Once the name and domain parts of the email address are in different Lucene 
indexes, would I need to modify my  Solr search string?
Thanks,
Roger


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, June 20, 2014 10:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing a term into separate Lucene indexes

On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field 
 is indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different 
indexes.  Although it is possible to search more than one separate index, 
that's very likely NOT what you want to do, and it comes with its own 
challenges.  What you most likely want is to put this data into different 
fields within the same index.

You'll need to write custom code to accomplish this, especially if you need the 
stored data to contain only the parts rather than the complete email address.  
A copyField can get the data to additional fields, but I'm not aware of 
anything built-in to the schema that can trim the unwanted information from the 
new fields, and even if there is, any stored data will be the original data for 
all three fields.  It's up to you whether this custom code is in a user 
application that does your indexing or in a custom update processor that you 
load as a plugin to Solr itself.  Extending whatever user application you are 
already using for indexing is very likely to be a lot easier.

Thanks,
Shawn



Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yonik Seeley
On Fri, Jun 20, 2014 at 11:16 AM, Floyd Wu floyd...@gmail.com wrote:
 Will these awesome features being implemented in Solr soon
  2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道:

Given the current makeup of the joint Lucene/Solr PMC, it's unclear.
I'm not worrying about that for now, and just pushing Heliosearch as
far and as fast as I can.
Come join us if you'd like to help!

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/20/2014 10:04 AM, Huang, Roger wrote:
 Due to security requirements, I do need the name and domain parts of the 
 email address stored in separate Lucene indexes.
 How do you recommend doing this?  What are the challenges?
 Once the name and domain parts of the email address are in different Lucene 
 indexes, would I need to modify my  Solr search string?

Solr works best if all the data for an individual document is contained
in a single flat schema.  As soon as you try to put some of the data in
one index and some of the data in another index, you'll probably run
into problems combining the data and/or problems with performance.  Solr
does have some join capability, but when it is mentioned, usually it is
to discuss the things it CAN'T do, not the things that it can do.

What kind of security requirement would necessitate splitting data that
logically belongs together?

Thanks,
Shawn



Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Floyd Wu
Hi Yonik, i dont' understand the relationship between solr and heliosearch
since you were committer of solr?

I just curious.
2014/6/21 上午12:07 於 Yonik Seeley yo...@heliosearch.com 寫道:

 On Fri, Jun 20, 2014 at 11:16 AM, Floyd Wu floyd...@gmail.com wrote:
  Will these awesome features being implemented in Solr soon
   2014/6/20 下午10:43 於 Yonik Seeley yo...@heliosearch.com 寫道:

 Given the current makeup of the joint Lucene/Solr PMC, it's unclear.
 I'm not worrying about that for now, and just pushing Heliosearch as
 far and as fast as I can.
 Come join us if you'd like to help!

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-20 Thread Yonik Seeley
On Fri, Jun 20, 2014 at 12:36 PM, Floyd Wu floyd...@gmail.com wrote:
 Hi Yonik, i dont' understand the relationship between solr and heliosearch
 since you were committer of solr?

Heliosearch is a Solr fork that will hopefully find it's way back to
the ASF in the future.

Here's the original project announcement:
http://heliosearch.org/heliosearch-solr-evolved/

And the project FAQ:
http://heliosearch.org/heliosearch-faq/

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


RE: running Post jar from different server

2014-06-20 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Sameer, Thanks for looking the post. Below are the two variables read from 
the xml file in my tool.

add key=JavaPath value=%JAVA_HOME%\bin\java.exe /
  add key=JavaArgument value= -Xms128m -Xmx256m 
-Durl=http://localhost:8983/solr/{0}/update -jar F:/DataDump/Tools/post.jar /

In commandline it is something like

C:\DataImport\bin\java.exe -Xms128m -Xmx256m 
-Durl=http://localhost:8983/solr/DataCollection/update -jar 
F:/DataDump/Tools/post.jar  F:/DatFiles/*.xml

F:\ is the network drive.

Thanks
Ravi

-Original Message-
From: Sameer Maggon [mailto:sam...@measuredsearch.com] 
Sent: Thursday, June 19, 2014 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: running Post jar from different server

Ravi,

post.jar is a standalone utility that does not have to be on the same server. 
If you can share the command you are executing, there might be some pointers in 
there.

Thanks,
--
*Sameer Maggon*
http://measuredsearch.com


On Thu, Jun 19, 2014 at 8:54 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi,  I have situation where my SQL Job initiate a console application ,
 where I am calling the post.jar to upload data to SOLR. Both SQL DB and
 SOLR are 2 different servers.

 I am calling post.jar from my SQLDB where the path is mapped to a network
 drive. I am getting an error file not found.

 Is the above scenario is possible, if anyone has some experience on this
 can you share or any direction will be really appreciated.

 Thanks

 Ravi



Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/20/2014 12:17 PM, Huang, Roger wrote:
 How would you recommend storing the name and domain parts of the email 
 address in separate Lucene indexes?
 To query, would I use the Solr cross-core join, fromIndex, toIndex?

I have absolutely no idea how to use Solr's join functionality.  It is
not required for my indexes.  Here's the wiki page on the subject:

https://wiki.apache.org/solr/Join

Additional note: Your reply did not come to the mailing list, it was
only sent to me.

Thanks,
Shawn



Discuss moving nextCursorMark to the beginning of response

2014-06-20 Thread Joseph Andaverde
I'd like to discuss moving the nextCursorMark to the beginning of a query
response. This way one can fetch another result set before completely
downloading the response. Currently, it's placed into the SOLR response
last. I figure this is just coincidence because it's a recent addition to
SOLR.


Re: Solr alternates returning different versions of the same document

2014-06-20 Thread Erick Erickson
If you update to a specific core, I suspect you're getting the doc
indexed on two shards which leads to duplicate documents being
returned. So it depends on which core happens to answer the request...
Fundamentally, all versions of a document must go to the same shard in
order for the new version to replace the old version. If you've put
the document specifically on a single node, you've bypassed the
automatic routing that would insure this...

I think the Admin UI kind of side-steps the usual routing process, but
I'm not entirely sure.

Best,
Erick

On Fri, Jun 20, 2014 at 12:47 AM, yann yannick.lallem...@gmail.com wrote:
 I have the following problem with Solr 4.5.1, with a cloud install with 4
 shards, no replication, using the built-in zookeeper on one Solr:

 I have updated a document via the Solr console (select a core, then select
 Documents). I used the CSV format to upload the document, including the
 document ID.

 When I query the document id from the Solr console (simple query:
 id:the-id-of-the-doc-I-updated), I alternatively obtain the old document
 (with the values before update, and a given _version_ number), or the new
 document (with the values after update, and a different _version_).

 No log messages in the Solr console about updating the document or anything.

 Any idea what might be going on, and how to fix that problem?

 Thanks in advance,

 Yann



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-alternates-returning-different-versions-of-the-same-document-tp4143006.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Undeletable phantom collection / core

2014-06-20 Thread John Smodic
Hi,

I have the following situation using SolrCloud:

deleteCollection foo - Could not find collection:foo

createCollection foo - Error CREATEing SolrCore 'foo_shard1_replica1': 
Could not create a new core in solr/foo_shard1_replica1/as another core is 
already defined there

unload Corefoo_shard1_replica1, delete index, delete dir - No such core 
exists 'foo_shard1_replica1'

My clusterstate.json is empty:

 get /clusterstate.json
{}

However, the /solr directory of my server does have the directory 
foo_shard1_replica1

How can I delete this phantom core / collection without manually deleting 
the directory and restarting my servers?

Thanks!

Re: Undeletable phantom collection / core

2014-06-20 Thread Shawn Heisey
On 6/20/2014 1:24 PM, John Smodic wrote:
 I have the following situation using SolrCloud:

 deleteCollection foo - Could not find collection:foo

 createCollection foo - Error CREATEing SolrCore 'foo_shard1_replica1': 
 Could not create a new core in solr/foo_shard1_replica1/as another core is 
 already defined there

 unload Corefoo_shard1_replica1, delete index, delete dir - No such core 
 exists 'foo_shard1_replica1'

 My clusterstate.json is empty:

  get /clusterstate.json
 {}

 However, the /solr directory of my server does have the directory 
 foo_shard1_replica1

 How can I delete this phantom core / collection without manually deleting 
 the directory and restarting my servers?

If the zookeeper database has no mention at all of the foo collection,
then it should be completely safe to just delete or rename the
directory, and you probably won't even need to restart Solr.

Because the core directory most likely does not have a conf directory,
you can't just CREATE and then UNLOAD the core with the
deleteInstanceDir option.  What you MIGHT be able to do for deleting it
with HTTP calls is this:

Temporarily create a new collection with a different name that has one
shard, with  being the name of an existing configuration stored in
zookeeper, ideally whichever config was being used for foo:
http://server:port/solr/admin/collections?action=CREATEname=barnumShards=1collection.configName=

Use CoreAdmin to create the foo_shard1_replica1 core as a replica of the
shard in the new collection:
http://server:port/solr/admin/cores?action=CREATEname=foo_shard1_replica1collection=barshard=shard1

If this CoreAdmin action works, then you can delete the new collection
entirely:
http://server:port/solr/admin/collections?action=DELETEname=bar

I have no idea whether this will actually work, but it's the best idea
that I have.

Thanks,
Shawn



Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread T. Kuro Kurosaka

On 06/20/2014 04:04 AM, Allison, Timothy B. wrote:

Let's say a predominantly English document contains a Chinese sentence.  If the 
English field uses the WhitespaceTokenizer with a basic WordDelimiterFilter, 
the Chinese sentence could be tokenized as one big token (if it doesn't have 
any punctuation, of course) and will be effectively unsearchable...barring use 
of wildcards.


In my experiment with Solr 4.6.1, both StandardTokenizer and ICUTokenizer
generates a token per han character. So they are searcheable though
precision suffers. But in your scenario, Chinese text is rare, so some 
precision

loss may not be a real issue.

Kuro



Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Jack Krupansky
Please post this issue on StackOverflow and one of us DataStax guys will 
deal with it there, since nobody here would know much about the specialized 
way that DataStax uses for dynamic schema and config loading.


Check your DSE server log for the 500 exception - but post it on SO since it 
is probably not Solr-related.


Sorry for the inconvenience!

-- Jack Krupansky

-Original Message- 
From: Frederic Esnault

Sent: Friday, June 20, 2014 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about sending solrconfig and schema files with java

Hi Shawn,

Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
with SolR enabled).
With cURL, i'm doing like this :

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
--data-binary @solrconfig.xml -H 'Content-type:text/xml;
charset=utf-8'

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl 
http://localhost:8983/solr/admin/cores?action=CREATEname=nhanes_ks.nhanes;



Except i'm doing this not on localhost but a remote server, and with
files generated in my java program (which are correct once generated,
i checked).

Using HttpComponents to send them does not work, it adds weird things
before the file (read from the cassandra blob after insert).

Using SolrJ to create the core does not work (cannot upload files, so
it's complaining about missing files).

Using a ContentStream request fails with an internal server error (no 
details)


   HttpSolrServer server = new HttpSolrServer(solrUrl);

   ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
datasetName + /);

   req.addContentStream(new ContentStreamBase.FileStream(new
File(./target/classes/solrconfig.xml)));

   server.request(req);

   server.commit();

*returned non ok status:500, message:Internal Server Error*



*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:


On 6/20/2014 8:46 AM, Frederic Esnault wrote:
 First thank you for taking the time to answer me.

 Actually i tried looking for a way to use SolrJ to upload my files, but 
 i

 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.
 All websites, blogs and docs i found seem to be based on the principle
that
 the core already exist or that the config files are already there.

You said that you know how to send the files with curl.  How are you
doing this?  If you can do it with curl, chances are good that you can
duplicate the request with HttpSolrServer in some java code.

Thanks,
Shawn






Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Frederic Esnault
Hi Jack, actually i posted on OS first, but got no anwser.
Check here :
https://stackoverflow.com/questions/24296014/datastax-dse-search-how-to-post-solrconfig-xml-and-schema-xml-using-java

I can't see any exception in cassandra/system.log at the moment of the
error. :(


*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-21 0:35 GMT+02:00 Jack Krupansky j...@basetechnology.com:

 Please post this issue on StackOverflow and one of us DataStax guys will
 deal with it there, since nobody here would know much about the specialized
 way that DataStax uses for dynamic schema and config loading.

 Check your DSE server log for the 500 exception - but post it on SO since
 it is probably not Solr-related.

 Sorry for the inconvenience!

 -- Jack Krupansky

 -Original Message- From: Frederic Esnault
 Sent: Friday, June 20, 2014 11:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about sending solrconfig and schema files with java


 Hi Shawn,

 Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
 with SolR enabled).
 With cURL, i'm doing like this :

 $ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
 --data-binary @solrconfig.xml -H 'Content-type:text/xml;
 charset=utf-8'

 $ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
 --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

 $ curl http://localhost:8983/solr/admin/cores?action=CREATE;
 name=nhanes_ks.nhanes


 Except i'm doing this not on localhost but a remote server, and with
 files generated in my java program (which are correct once generated,
 i checked).

 Using HttpComponents to send them does not work, it adds weird things
 before the file (read from the cassandra blob after insert).

 Using SolrJ to create the core does not work (cannot upload files, so
 it's complaining about missing files).

 Using a ContentStream request fails with an internal server error (no
 details)

HttpSolrServer server = new HttpSolrServer(solrUrl);

ContentStreamUpdateRequest req = new
 ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
 datasetName + /);

req.addContentStream(new ContentStreamBase.FileStream(new
 File(./target/classes/solrconfig.xml)));

server.request(req);

server.commit();

 *returned non ok status:500, message:Internal Server Error*



 *Frédéric Esnault*
 CTO / CO-FOUNDER

 *SERENZIA*

 57 Rue Maurice Bokanowski
 92600 Asnières-sur-Seine

 Tel : +33 6 49 45 53 38
 Mail : fesna...@serenzia.com




 2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:

  On 6/20/2014 8:46 AM, Frederic Esnault wrote:
  First thank you for taking the time to answer me.
 
  Actually i tried looking for a way to use SolrJ to upload my files, but
  i
  cannot find anywhere informations about how to create nodes with their
  config files using SolrJ.
  All websites, blogs and docs i found seem to be based on the principle
 that
  the core already exist or that the config files are already there.

 You said that you know how to send the files with curl.  How are you
 doing this?  If you can do it with curl, chances are good that you can
 duplicate the request with HttpSolrServer in some java code.

 Thanks,
 Shawn






Re: Question about sending solrconfig and schema files with java

2014-06-20 Thread Jack Krupansky

Oops! Sorry I missed it. Please post of the rest of the info on SO as well.

We'll get to it!

-- Jack Krupansky

-Original Message- 
From: Frederic Esnault

Sent: Friday, June 20, 2014 7:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about sending solrconfig and schema files with java

Hi Jack, actually i posted on OS first, but got no anwser.
Check here :
https://stackoverflow.com/questions/24296014/datastax-dse-search-how-to-post-solrconfig-xml-and-schema-xml-using-java

I can't see any exception in cassandra/system.log at the moment of the
error. :(


*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*
57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-21 0:35 GMT+02:00 Jack Krupansky j...@basetechnology.com:


Please post this issue on StackOverflow and one of us DataStax guys will
deal with it there, since nobody here would know much about the 
specialized

way that DataStax uses for dynamic schema and config loading.

Check your DSE server log for the 500 exception - but post it on SO since
it is probably not Solr-related.

Sorry for the inconvenience!

-- Jack Krupansky

-Original Message- From: Frederic Esnault
Sent: Friday, June 20, 2014 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about sending solrconfig and schema files with java


Hi Shawn,

Actually i should say that i'm using DSE Search (ie. Datastax Enterprise
with SolR enabled).
With cURL, i'm doing like this :

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/solrconfig.xml
--data-binary @solrconfig.xml -H 'Content-type:text/xml;
charset=utf-8'

$ curl http://localhost:8983/solr/resource/nhanes_ks.nhanes/schema.xml
--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl http://localhost:8983/solr/admin/cores?action=CREATE;
name=nhanes_ks.nhanes


Except i'm doing this not on localhost but a remote server, and with
files generated in my java program (which are correct once generated,
i checked).

Using HttpComponents to send them does not work, it adds weird things
before the file (read from the cassandra blob after insert).

Using SolrJ to create the core does not work (cannot upload files, so
it's complaining about missing files).

Using a ContentStream request fails with an internal server error (no
details)

   HttpSolrServer server = new HttpSolrServer(solrUrl);

   ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/resources/+solrKeyspace + . +
datasetName + /);

   req.addContentStream(new ContentStreamBase.FileStream(new
File(./target/classes/solrconfig.xml)));

   server.request(req);

   server.commit();

*returned non ok status:500, message:Internal Server Error*



*Frédéric Esnault*
CTO / CO-FOUNDER

*SERENZIA*

57 Rue Maurice Bokanowski
92600 Asnières-sur-Seine

Tel : +33 6 49 45 53 38
Mail : fesna...@serenzia.com




2014-06-20 17:34 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 6/20/2014 8:46 AM, Frederic Esnault wrote:

 First thank you for taking the time to answer me.

 Actually i tried looking for a way to use SolrJ to upload my files, but
 i
 cannot find anywhere informations about how to create nodes with their
 config files using SolrJ.
 All websites, blogs and docs i found seem to be based on the principle
that
 the core already exist or that the config files are already there.

You said that you know how to send the files with curl.  How are you
doing this?  If you can do it with curl, chances are good that you can
duplicate the request with HttpSolrServer in some java code.

Thanks,
Shawn









Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread Simon Cheng
Hi Tim,

I'm working on a similar project with some differences and may be we can
share our knowledge in this area :

1) I have no problem with the Chinese characters. You can try this link :

http://123.100.239.158:8983/solr/collection1/browse?q=%E4%B8%AD%E5%9B%BD

Solr can find the record even the phrase 中国 (meaning China) is in the
middle of the sentence.

2) My problem is more relating to other Asian languages ... Thai and Arabic
are two examples. Read from
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters that
solr.ICUTokenizerFactory  can overcome the problem and I am exploring this
approach at the moment.

Simon.



On Sat, Jun 21, 2014 at 7:37 AM, T. Kuro Kurosaka k...@healthline.com
wrote:

 On 06/20/2014 04:04 AM, Allison, Timothy B. wrote:

 Let's say a predominantly English document contains a Chinese sentence.
  If the English field uses the WhitespaceTokenizer with a basic
 WordDelimiterFilter, the Chinese sentence could be tokenized as one big
 token (if it doesn't have any punctuation, of course) and will be
 effectively unsearchable...barring use of wildcards.


 In my experiment with Solr 4.6.1, both StandardTokenizer and ICUTokenizer
 generates a token per han character. So they are searcheable though
 precision suffers. But in your scenario, Chinese text is rare, so some
 precision
 loss may not be a real issue.

 Kuro




ping an unloaded core with a replica returns ok

2014-06-20 Thread YouPeng Yang
Hi
   As the title.I am using solr 4.6 with solrCloud. One of my leader core
within  a shard  have bean unloaded,the ping to the unloaded core and
return OK.
 Is it normal?
 How to send the right ping request to the core,and get the no ok?