date:20130520

Bangalore Apache Lucene/Solr meetup

2013-05-20 Thread Anshum Gupta

Hi folks,

We just created a new meetup group for all Lucene/Solr enthusiasts in and
around Bangalore. We're holding our first meetup on the 1st Of June, 2013.
Link to the meetup page -
http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/ . Feel free to
join.

Here's the link to the first meetup event:
http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/events/113806762/ .

-- 

Anshum Gupta
http://www.anshumgupta.net

Hard Commit giving OOM Error on Index Writer in Solr 4.2.1

2013-05-20 Thread Umesh Prasad

Hi All,
   I am hitting an OOM error while trying to do an hard commit on one of
the cores.

Transaction log dir is Empty and DIH shows indexing going on for  13 hrs..

*Indexing since 13h 22m 22s*
Requests: 5,211,392 (108/s), Fetched: 1,902,792 (40/s), Skipped: 106,853,
Processed: 1,016,696 (21/s)
Started: about 13 hours ago



response>
5004this writer hit
an OutOfMemoryError; cannot commitjava.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1055)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)




-- 
---
Thanks & Regards
Umesh Prasad

Re: solr UI logging when using logback?

2013-05-20 Thread Boogie Shafer

thanks for the pointer on the missing logwatcher for logback...i'll take a
look at that.

on the jetty logging side of things i get nearly all the jetty logging but
the initial startup logs which seem to happen prior to the other logging
jars getting loaded. perhaps i need to add a few more statements to my
logback.xml config, but it seems to be getting its naming
_MM_DD-HHmmssSSS.start.log and pattern from somewhere outside logback.

in the _MM_DD-HHmmssSSS.start.log i get messages like this

cat 2013_05_15-141100827.start.log

Establishing start.log on Wed May 15 14:11:12 PDT 2013
14:11:15,756 |-INFO in null - Will use configuration file
[resources/logback-access.xml]
14:11:15,768 |-INFO in
ch.qos.logback.access.joran.action.ConfigurationAction - debug attribute
not set
14:11:15,769 |-INFO in
ch.qos.logback.core.joran.action.StatusListenerAction - Added status
listener of type [ch.qos.logback.core.status.OnConsoleStatusListener]
14:11:15,770 |-INFO in ch.qos.logback.core.joran.action.AppenderAction -
About to instantiate appender of type
[ch.qos.logback.core.rolling.RollingFileAppender]
14:11:15,770 |-INFO in ch.qos.logback.core.joran.action.AppenderAction -
Naming appender as [FILE]
14:11:15,774 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - Will use
gz compression




On Mon, May 20, 2013 at 10:11 AM, Shawn Heisey  wrote:

> On 5/20/2013 10:44 AM, Boogie Shafer wrote:
>
>> BUT i havent figured out what i need to do to get the logging events to
>> display in the SOLR admin ui
>>
>> e.g. at 
>> http://solr-hostname:8983/**solr/#/~logging
>>
>
> The logging page in the UI is populated by log watcher classes specific to
> the logging implementation.  Prior to 4.3, the only watcher available in
> released Solr versions was the one for java.util.logging.  The log4j
> watcher was incorporated in the 4.3.0 release.  I have been using log4j
> since 4.1-SNAPSHOT, but I don't yet have any 4.3 servers in production, so
> I can't get logs in my UI.
>
> To get log events in the UI with logback, you would need to implement a
> watcher specifically for logback.  I don't think this is a high priority
> item for the project at the moment, but patches are welcome.
>
>
>  AND
>> i'm wondering if its possible to get the jetty start log managed under
>> logback
>>
>
> On my setup using the jetty included with Solr and the slf4j/log4j jars in
> lib/ext, all jetty log entries are logged to the same file as my Solr logs,
> according to my log4j.properties file.
>
> If you have any logging config for jetty itself, then that will be used.
>  The easiest way to proceed is to simply comment or remove that logging
> config.  That will cause jetty to find slf4j in the classpath and use it,
> which you have already configured to use logback.  The example jetty config
> does not have any logging configured.
>
> Thanks,
> Shawn
>
>

How to handle special characters in Solr search

2013-05-20 Thread kretoni

Hello all,

Currently, I used solr for products searching. I used Java web
platform.
   











Normal searching are ok. But when I put the special characters of solr only
there is occured the Parser exception. e.g. [] , ||. So, I used
ClientUtils.escapeQueryChars(q) in my controller. But exception is still
occured. So, how to prevent or handle putting special characters in my
search box?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-handle-special-characters-in-Solr-search-tp4064816.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Expanding sets of words

2013-05-20 Thread Mike Hugo

Fantastic!  Thanks!


On Mon, May 20, 2013 at 11:21 PM, Jack Krupansky wrote:

> Yes, with the Solr "surround" query parser:
>
> q=(java OR groovy OR scala) W (programming OR coding OR development)
>
> BUT... there is the caveat that the surround query parser does no
> analysis. So, maybe you need "Java OR java" etc. Or, if you know that the
> index is lower case.
>
> Try this dataset:
>
> curl 
> http://localhost:8983/solr/**collection1/update?commit=true-H
>  'Content-type:application/csv' -d '
> id,features
> doc-1,java coding
> doc-2,java programming
> doc-3,java development
> doc-4,groovy coding
> doc-5,groovy programming
> doc-6,groovy development
> doc-7,scala coding
> doc-8,scala programming
> doc-9,scala development
> doc-10,c coding
> doc-11,c programming
> doc-12,c development
> doc-13,java language
> doc-14,groovy language
> doc-15,scala language'
>
> And try these commands:
>
> curl "http://localhost:8983/solr/**select/?q=(java+OR+scala)+W+**
> programming\
> &df=features&defType=surround&**indent=true"
>
> curl "http://localhost:8983/solr/**select/
> ?\
> q=(java+OR+scala)+W+(**programming+OR+coding)\
> &df=features&defType=surround&**indent=true"
>
> curl 
> "http://localhost:8983/solr/**select/\
> ?q=(java+OR+groovy+OR+scala)+**W+(programming+OR+coding+OR+**development)\
> &df=features&defType=surround&**indent=true"
>
> The LucidWorks Search query parser also supports NEAR, BEFORE, and AFTER
> operators, in conjunction with OR and "-" to generate span queries:
>
> q=(java OR groovy OR scala) BEFORE:0 (programming OR coding OR development)
>
> -- Jack Krupansky
>
> -Original Message- From: Mike Hugo
> Sent: Monday, May 20, 2013 11:42 PM
> To: solr-user@lucene.apache.org
> Subject: Expanding sets of words
>
>
> Is there a way to query for combinations of two sets of words?  For
> example, if I had
>
> (java or groovy or scala)
> (programming or coding or development)
>
> Is there a query parser that, at query time, would expand that into
> combinations like
>
> java programming
> groovy programming
> scala programming
> java coding
> java development
> 
> etc etc etc
>
> Thanks!
>
> Mike
>

Re: Expanding sets of words

2013-05-20 Thread Jack Krupansky


Yes, with the Solr "surround" query parser:

q=(java OR groovy OR scala) W (programming OR coding OR development)

BUT... there is the caveat that the surround query parser does no analysis. 
So, maybe you need "Java OR java" etc. Or, if you know that the index is 
lower case.


Try this dataset:

curl http://localhost:8983/solr/collection1/update?commit=true -H 
'Content-type:application/csv' -d '

id,features
doc-1,java coding
doc-2,java programming
doc-3,java development
doc-4,groovy coding
doc-5,groovy programming
doc-6,groovy development
doc-7,scala coding
doc-8,scala programming
doc-9,scala development
doc-10,c coding
doc-11,c programming
doc-12,c development
doc-13,java language
doc-14,groovy language
doc-15,scala language'

And try these commands:

curl "http://localhost:8983/solr/select/?q=(java+OR+scala)+W+programming\
&df=features&defType=surround&indent=true"

curl "http://localhost:8983/solr/select/?\
q=(java+OR+scala)+W+(programming+OR+coding)\
&df=features&defType=surround&indent=true"

curl "http://localhost:8983/solr/select/\
?q=(java+OR+groovy+OR+scala)+W+(programming+OR+coding+OR+development)\
&df=features&defType=surround&indent=true"

The LucidWorks Search query parser also supports NEAR, BEFORE, and AFTER 
operators, in conjunction with OR and "-" to generate span queries:


q=(java OR groovy OR scala) BEFORE:0 (programming OR coding OR development)

-- Jack Krupansky

-Original Message- 
From: Mike Hugo

Sent: Monday, May 20, 2013 11:42 PM
To: solr-user@lucene.apache.org
Subject: Expanding sets of words

Is there a way to query for combinations of two sets of words?  For
example, if I had

(java or groovy or scala)
(programming or coding or development)

Is there a query parser that, at query time, would expand that into
combinations like

java programming
groovy programming
scala programming
java coding
java development

etc etc etc

Thanks!

Mike

Re: Expanding sets of words

2013-05-20 Thread Gora Mohanty

On 21 May 2013 09:12, Mike Hugo  wrote:
> Is there a way to query for combinations of two sets of words?  For
> example, if I had
>
> (java or groovy or scala)
> (programming or coding or development)
>
> Is there a query parser that, at query time, would expand that into
> combinations like
>
> java programming
> groovy programming
> scala programming
> java coding
> java development
> 
> etc etc etc

How many such combinations are there? If these are limited
in number, and can be pre-defined, the easiest way might be
to use synonyms:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

If that does not meet your needs, you will probably need to
write a custom query parser, which is not too difficult:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

Regards,
Gora

Expanding sets of words

2013-05-20 Thread Mike Hugo

Is there a way to query for combinations of two sets of words?  For
example, if I had

(java or groovy or scala)
(programming or coding or development)

Is there a query parser that, at query time, would expand that into
combinations like

java programming
groovy programming
scala programming
java coding
java development

etc etc etc

Thanks!

Mike

Re: Solr httpCaching for distinct handlers

2013-05-20 Thread Chris Hostetter


: Hi everybody, I would like to have distinct httpCaching configuration for
: distinct handlers, i.e if a request comes for select, send a cache control
: header of 1 minute ; and if receive a request for mlt then send a cache
: control header of 5 minutes.
: Is there a way to do that in my solrconfig.xml ?
: Thanks!

Unfortunatenly no.

the request handlers are agnostic to the fact that they run in a servlet 
container, only the requestDispatcher has settings related to the caching 
headers.

Two options you could consider depending on your end goal...

1) put a proxy in front of Solr that specifies the Cache-Control header 
based on pattern matching of the URL.  (There may be an existing servlet 
filter you could configure in jetty for this)

2) if you already have a caching proxy in front of Solr, and your goal is 
to changing how long that proxy caches the responses, you might be able to 
configure it explicitly assume certain max-ages based on the request URL 
(this is a very nice feature of Squid for example)





-Hoss

Re: Solr 4 memory usage increase

2013-05-20 Thread Chris Hostetter


: We have master/slave setup. We disabled autocommits/autosoftcommits. So the
: slave only replicates from master and serve query. Master does all the
: indexing and commit every 5 minutes. Slave polls master every 2.5 minutes
: and does replication.

Details matter...

Are you using hte exact same configs that you had with 3.5, or did you 
copy the configs from 4.0 and then modify them?

The distinction is important, because if you "disabled 
autocommits/autosofcommits" from the 4.0 example configs, w/o also 
disabling hte updateLog, then you'll probably see the updateLog eat up a 
lot of RAM.

When asking a general question, ie: "It uses more RAM" you need to provide 
a lot of details about your usage for anyone to even attempt to make a 
guess about your problem.

It would probably help folks help you if you provided your actual configs, 
specifics about the number of documents, size of index, types of queries, 
etc


Please review in it's entirety...
https://wiki.apache.org/solr/UsingMailingLists


-Hoss

Re: Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Cord Thomas

Thank you Brendan,

I had started to read about the tokenizers and couldn't quite piece
together how it would work.  I will read about this and post my
implementation if successful.

Cord


On Mon, May 20, 2013 at 4:13 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Hi Cord,
>
> I think you'd do it like this:
>
> 1. Add this to schema.xml
>
> 
> 
>   
>/>
>   
>   
>   
>   
> 
>
>  stored="true" multiValued="true" />
>
> 2. When you index add the 'folders' to the folders_facet field (or whatever
> you want to call it).
> 3. Your query would look something like:
>
> http://localhost:8982/solr/
> /select?facet=on&facet.field=folders_facet&facet.mincount=1&
>
> There is a good explanation here:
>
> http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory
>
>
> Hope that helps.
> Brendan
>
>
>
>
>
>
> On Mon, May 20, 2013 at 4:18 PM, Cord Thomas 
> wrote:
>
> > Hello,
> >
> > I am submitting rich documents to a SOLR index via Solr Cell.   This is
> all
> > working well.
> >
> > The documents are organized in meaningful folders.  I would like to
> capture
> > the folder names in my index so that I can use the folder names to
> provide
> > facets.
> >
> > I can pass the path data into the indexing process and would like to
> > convert 2 paths deep into indexed and stored data - or copy field data.
> >
> > Say i have files in these folders:
> >
> > Financial
> > Financial/Annual
> > Financial/Audit
> > Organizational
> > Organizational/Offices
> > Organizational/Staff
> >
> > I would like to then provide facets using these names.
> >
> > Can someone please guide me in the right direction on how I might
> > accomplish this?
> >
> > Thank you
> >
> > Cord
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>

Re: Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Brendan Grainger

Hi Cord,

I think you'd do it like this:

1. Add this to schema.xml



  
  
  
  
  
  




2. When you index add the 'folders' to the folders_facet field (or whatever
you want to call it).
3. Your query would look something like:

http://localhost:8982/solr/
/select?facet=on&facet.field=folders_facet&facet.mincount=1&

There is a good explanation here:
http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory


Hope that helps.
Brendan






On Mon, May 20, 2013 at 4:18 PM, Cord Thomas  wrote:

> Hello,
>
> I am submitting rich documents to a SOLR index via Solr Cell.   This is all
> working well.
>
> The documents are organized in meaningful folders.  I would like to capture
> the folder names in my index so that I can use the folder names to provide
> facets.
>
> I can pass the path data into the indexing process and would like to
> convert 2 paths deep into indexed and stored data - or copy field data.
>
> Say i have files in these folders:
>
> Financial
> Financial/Annual
> Financial/Audit
> Organizational
> Organizational/Offices
> Organizational/Staff
>
> I would like to then provide facets using these names.
>
> Can someone please guide me in the right direction on how I might
> accomplish this?
>
> Thank you
>
> Cord
>



-- 
Brendan Grainger
www.kuripai.com

Re: seeing lots of "autowarming" messages in log during DIH indexing

2013-05-20 Thread shreejay

geeky2 wrote
> you mean i would add this switch to my script that kicks of the
> dataimport?
> 
> exmaple:
> 
> 
> OUTPUT=$(curl -v
> http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport
> -F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
> optimize=${OPTIMIZE} -F openSearcher=false)

Yes. Thats correct



geeky2 wrote
> what needs to be done _AFTER_ the DIH finishes (if anything)?
> 
> eg, does this need to be turned back on after the DIH has finished?

Yes. You need to open the searcher to be able to search. Just run another
commit with openSearcher = true , once your indexing process finishes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064768.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store complex (i.e. label + id) meta data in SOLR document

2013-05-20 Thread Achim Domma

Sorry, I think my reference to restriction by country was more confusing than 
helpful. Let's say, that the author of the document is one dimension I would 
like to use facets for. "author" would be one field in my document schema. Now 
let's take "Schmidt, M." as author name, which is quite common in Germany. 
There are multiple authors with that name which all have unique ids in our 
source data. When searching and calculating facets, I would not like to get 
back only one "Schmidt, M." but multiple ones, each having the unique id from 
our source data.

I thinks so far it would be easy to store "123:Schmidt, M.", "234:Schmidt, M.", 
... But I would also like to be able to search for "Schmidt" or to get 
autocomplete for "Schm...". Therefore my idea to have a custom field type, 
which stores "123:Schmidt", ... as value, but processes the string in a way 
(i.e. split at ':' and strip the first part), that only "Schmidt" get's stored 
in the search index.

If I do it like this, I would expect, that text search and autocomplete work 
just with "Schmidt, M." but I should get back "123:Schmidt, M.", ... as facet. 
I think that would solve my problem.

My first question would be: Does this make sense at all or do I understand 
something wrong? And the second question would be: Is there a better, simpler 
solution?

Does this make more clear what I want to do?

kind regards,
Achim




Am 20.05.2013 um 23:27 schrieb Jack Krupansky:

> Tell us a little more, with examples, of how you really want to search and 
> facet this information.
> 
> One technique is to store the same information in multiple ways, for 
> different uses, combining the name in different ways, such as "Berlin", 
> "Berlin:DE", "Berlin, NJ", "Berlin:Germany", "Berlin GERMANY", etc.
> 
> Ultimately, the idea for facets is not that they uniquely identify an entity, 
> but that a combination of facet selections let you drill down into the data, 
> such that each facet selection narrows one dimension.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Achim Domma
> Sent: Monday, May 20, 2013 5:12 PM
> To: solr-user@lucene.apache.org
> Subject: Store complex (i.e. label + id) meta data in SOLR document
> 
> I store documents having some meta data that is composed out of multiple 
> values. Usually an id with a label. A simple example would be the name of a 
> city and the unique id of that city. The id is needed, because different 
> cities can have the same name like Berlin in Germany and Berlin in the US. 
> The name is obviously needed, because I want to search for that string.
> 
> If I use facets, I would like to get back two facets having the label 
> "Berlin". If I restrict my search (using some other meta data field) to 
> documents from germany, I would expect to get only one facet for the german 
> Berlin. Obviously this does not work, if I store id and label in two 
> seperated SOLR fields.
> 
> I would assume that this is not an uncommon requirement, but I was not able 
> to find any useful information. My current approaches are:
> 
> * Implement a complete custom field type in Java: Hard to estimate for me, 
> because I'm currently just a SOLR user, not a SOLR developer.
> 
> * Put id and label in a single string (like "123:Berlin" and "456:Berlin") 
> and define custom field types in schema.xml using a custom analyzer which 
> splits the value. Sound reasonable to me, but I'm not 100% sure if it will 
> work with faceting.
> 
> * I found some references to subfields, but only on older pages and I was not 
> able to find useful documentation.
> 
> Is there some well known way to solve this in SOLR?
> 
> kind regards,
> Achim=

Re: Replica shards not updating their index when update is sent to them

2013-05-20 Thread Sebastián Ramírez

Yes, It's happening with the latest version, 4.2.1

Yes, it's easy to reproduce.
It happened using 3 Virtual Machines and also happened using 3 physical
nodes.

Here are the details:

I installed Hortonworks (a Hadoop distribution) in the 3 nodes. That
installs Zookeeper.

I used the "example" directory and copied it to the 3 nodes.

I start Zookeeper in the 3 nodes.

The first time, I run this command on each node, to start Solr:  java -jar
-Dbootstrap_conf=true -DzkHost='node1,node2,node3'  start.jar

As I understand, the "-Dbootstrap_conf=true" uploads the configuration to
Zookeeper, so I don't need to do that the following times that I start each
SolrCore.

So, the following times, I run this on each node: java -jar
-DzkHost='node0,node1,node2' start.jar

Because I ran that command on node0 first, that node became the leader
shard.

I send an update to the leader shard, (in this case node0):
I run curl 'http://node0:8983/solr/update?commit=true' -H 'Content-type:
text/xml' --data-binary 'asdfbuggy'

When I query any shard I get the correct result:
I run curl 'http://node0:8983/solr/select?q=id:asdf'
or curl 'http://node1:8983/solr/select?q=id:asdf'
or curl 'http://node2:8983/solr/select?q=id:asdf'
(i.e. I send the query to each node), and then I get the expected response ...
asdf buggy 
... ...

But when I send an update to a replica shard (node2) it is updated only in
the leader shard (node0) and in the other replica (node1), not in the shard
that received the update (node2):
I send an update to the replica node2,
I run curl 'http://node2:8983/solr/update?commit=true' -H 'Content-type:
text/xml' --data-binary 'asdfbig moth'

Then I query each node and I receive the updated results only from the
leader shard (node0) and the other replica shard (node1).

I run (leader, node0):
curl 'http://node0:8983/solr/select?q=id:asdf'
And I get:
... asdf big moth
 ...  ...

I run (other replica, node1):
curl 'http://node1:8983/solr/select?q=id:asdf'
And I get:
... asdf big moth
 ...  ...

I run (first replica, the one that received the update, node2):
curl 'http://node2:8983/solr/select?q=id:asdf'
And I get (old result):
... asdf buggy
 ...  ...

Thanks for your interest,

Sebastián Ramírez

On Mon, May 20, 2013 at 3:30 PM, Yonik Seeley  wrote:

> On Mon, May 20, 2013 at 4:21 PM, Sebastián Ramírez
>  wrote:
> > When I send an update to a non-leader (replica) shard (B), the updated
> > results are reflected in the leader shard (A) and in the other replica
> > shard (C), but not in the shard that received the update (B).
>
> I've never seen that before.  The replica that received the update
> isn't treated as special in any way by the code, so it's not clear how
> this could happen.
>
> What version of Solr is this (and does it happen with the latest
> version)?  How easy is this to reproduce for you?
>
> -Yonik
> http://lucidworks.com
>

-- 
**
*This e-mail transmission, including any attachments, is intended only for 
the named recipient(s) and may contain information that is privileged, 
confidential and/or exempt from disclosure under applicable law. If you 
have received this transmission in error, or are not the named 
recipient(s), please notify Senseta immediately by return e-mail and 
permanently delete this transmission, including any attachments.*

Re: Store complex (i.e. label + id) meta data in SOLR document

2013-05-20 Thread Jack Krupansky

Tell us a little more, with examples, of how you really want to search and 
facet this information.


One technique is to store the same information in multiple ways, for 
different uses, combining the name in different ways, such as "Berlin", 
"Berlin:DE", "Berlin, NJ", "Berlin:Germany", "Berlin GERMANY", etc.


Ultimately, the idea for facets is not that they uniquely identify an 
entity, but that a combination of facet selections let you drill down into 
the data, such that each facet selection narrows one dimension.


-- Jack Krupansky

-Original Message- 
From: Achim Domma

Sent: Monday, May 20, 2013 5:12 PM
To: solr-user@lucene.apache.org
Subject: Store complex (i.e. label + id) meta data in SOLR document

I store documents having some meta data that is composed out of multiple 
values. Usually an id with a label. A simple example would be the name of a 
city and the unique id of that city. The id is needed, because different 
cities can have the same name like Berlin in Germany and Berlin in the US. 
The name is obviously needed, because I want to search for that string.


If I use facets, I would like to get back two facets having the label 
"Berlin". If I restrict my search (using some other meta data field) to 
documents from germany, I would expect to get only one facet for the german 
Berlin. Obviously this does not work, if I store id and label in two 
seperated SOLR fields.


I would assume that this is not an uncommon requirement, but I was not able 
to find any useful information. My current approaches are:


* Implement a complete custom field type in Java: Hard to estimate for me, 
because I'm currently just a SOLR user, not a SOLR developer.


* Put id and label in a single string (like "123:Berlin" and "456:Berlin") 
and define custom field types in schema.xml using a custom analyzer which 
splits the value. Sound reasonable to me, but I'm not 100% sure if it will 
work with faceting.


* I found some references to subfields, but only on older pages and I was 
not able to find useful documentation.


Is there some well known way to solve this in SOLR?

kind regards,
Achim=

Store complex (i.e. label + id) meta data in SOLR document

2013-05-20 Thread Achim Domma

I store documents having some meta data that is composed out of multiple 
values. Usually an id with a label. A simple example would be the name of a 
city and the unique id of that city. The id is needed, because different cities 
can have the same name like Berlin in Germany and Berlin in the US. The name is 
obviously needed, because I want to search for that string.

If I use facets, I would like to get back two facets having the label "Berlin". 
If I restrict my search (using some other meta data field) to documents from 
germany, I would expect to get only one facet for the german Berlin. Obviously 
this does not work, if I store id and label in two seperated SOLR fields.

I would assume that this is not an uncommon requirement, but I was not able to 
find any useful information. My current approaches are:

 * Implement a complete custom field type in Java: Hard to estimate for me, 
because I'm currently just a SOLR user, not a SOLR developer.

 * Put id and label in a single string (like "123:Berlin" and "456:Berlin") and 
define custom field types in schema.xml using a custom analyzer which splits 
the value. Sound reasonable to me, but I'm not 100% sure if it will work with 
faceting.

 * I found some references to subfields, but only on older pages and I was not 
able to find useful documentation.

Is there some well known way to solve this in SOLR?

kind regards,
Achim

Re: Replica shards not updating their index when update is sent to them

2013-05-20 Thread Yonik Seeley

On Mon, May 20, 2013 at 4:21 PM, Sebastián Ramírez
 wrote:
> When I send an update to a non-leader (replica) shard (B), the updated
> results are reflected in the leader shard (A) and in the other replica
> shard (C), but not in the shard that received the update (B).

I've never seen that before.  The replica that received the update
isn't treated as special in any way by the code, so it's not clear how
this could happen.

What version of Solr is this (and does it happen with the latest
version)?  How easy is this to reproduce for you?

-Yonik
http://lucidworks.com

Replica shards not updating their index when update is sent to them

2013-05-20 Thread Sebastián Ramírez

Hello,

I'm having a little problem with a test SolrCloud cluster.

I've set up 3 nodes (SolrCores) to use an external Zookeeper. I use 1 shard
and the other 2 SolrCores are being auto-asigned as replicas.

Let's say I have these 3 nodes: the leader shard A, the replica shard B,
and the (other) replica shard C.

I can send queries to any node (A, B or C) and I get the results.

I can send updates to the leader shard (A) and get correct (updated)
results in any of the 3 shards (A, B, or C).

* Here is the problem:
When I send an update to a non-leader (replica) shard (B), the updated
results are reflected in the leader shard (A) and in the other replica
shard (C), but not in the shard that received the update (B). I can do this
same process, send the update to the other non-leader shard (C), and the
same happens, I get the results in the leader (A) and in the other replica
shard (B), but not in the shard that received the update (C).

Any suggestion?

Thanks!

Sebastián Ramírez

-- 
**
*This e-mail transmission, including any attachments, is intended only for 
the named recipient(s) and may contain information that is privileged, 
confidential and/or exempt from disclosure under applicable law. If you 
have received this transmission in error, or are not the named 
recipient(s), please notify Senseta immediately by return e-mail and 
permanently delete this transmission, including any attachments.*

Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Cord Thomas

Hello,

I am submitting rich documents to a SOLR index via Solr Cell.   This is all
working well.

The documents are organized in meaningful folders.  I would like to capture
the folder names in my index so that I can use the folder names to provide
facets.

I can pass the path data into the indexing process and would like to
convert 2 paths deep into indexed and stored data - or copy field data.

Say i have files in these folders:

Financial
Financial/Annual
Financial/Audit
Organizational
Organizational/Offices
Organizational/Staff

I would like to then provide facets using these names.

Can someone please guide me in the right direction on how I might
accomplish this?

Thank you

Cord

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-20 Thread Rishi Easwaran

No, we just upgraded to 4.2.1.
With the size of our complex and effort required apply our patches and rollout, 
our upgrades are not that often.


 

 

-Original Message-
From: Noureddine Bouhlel 
To: solr-user 
Sent: Mon, May 20, 2013 3:36 pm
Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results.


Hi Rishi,

Have you done any tests with Solr 4.3 ?

Regards,


Cordialement,

BOUHLEL Noureddine



On 17 May 2013 21:29, Rishi Easwaran  wrote:

>
>
> Hi All,
>
> Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured
> I'd share some good news.
> I work for AOL mail team and we use SOLR for our mail search backend.
> We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
> community.
> We deal with millions indexes and billions of requests a day across our
> complex.
> We finished full rollout of SOLR 4.2.1 into our production last week.
>
> Some key highlights:
> - ~75% Reduction in Search response times
> - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
> Reduction in errors
> - Garbage collection total stop reduction by over 50% moving application
> throughput into the 99.8% - 99.9% range
> - ~15% reduction in CPU usage
>
> We did not tune our application moving from 3.5 to 4.2.1 nor update java.
> For the most part it was a binary upgrade, with patches for our special
> use case.
>
> Now going forward we are looking at prototyping SOLR Cloud for our search
> system, upgrade java and tomcat, tune our application further. Lots of fun
> stuff :)
>
> Have a great weekend everyone.
> Thanks,
>
> Rishi.
>
>
>
>
>

Re: Inaccurate wiki documentation?

2013-05-20 Thread Shawn Heisey


On 5/20/2013 1:28 PM, Shane Perry wrote:

Using the 4.3 war available for download, I attempted to set up my core
using the solr.properties file (in anticipation of moving to 5.0).  When I
start the context, logging shows that the process is falling back to the
default solr.xml file (essentially the second bullet does not occur).
  After digging through the 4_3 branch it looks like solr.properties is not
yet part of the library.  Am I missing something (I'm able to get the
context started using a solr.xml file with "" as the contents)?


I think you're right about the wiki being inaccurate.  Although I 
probably have enough access to rename the wiki page from 4.3 to 4.4, I 
don't know how to do it.  I have updated the existing page with a note 
summarizing some of what's below:


The core discovery code is fundamentally broken in 4.3.  The code that 
will be released as part of Solr 4.4 (branch_4x) is VERY different. 
There will soon be a 4.3.1 release, but due to the massive code changes, 
the known bugs in core discovery will not be fixed in that release, and 
there may be bugs that we don't even know about yet.


If you want to experiment with core discovery now, I would recommend 
that you download the source code or a nightly build of branch_4x, the 
current stable release branch.  This branch is usually free of major 
showstopper bugs.  Not always of course, because it is under active 
development, but we do try.


If you want something more stable, then use the old-style solr.xml with 
4.2.1 or 4.3.x.


The 5.0 release (currently called trunk) is still a long way off.  The 
trunk and branch_4x codebases are currently similar enough that most 
patches will work on both versions without any manual work at all.  This 
means that almost every new feature gets added to both 4.x and 5.0. 
Eventually the two branches will diverge to the point where backporting 
features to 4.x will be extremely hard.  That is likely the point where 
branch_5x will be created and there will be a major effort to release 5.0.


Thanks,
Shawn

Re: Existing Project using Hibernate, Spring and Lucene and Looking to Add Solr

2013-05-20 Thread Shawn Heisey


On 5/20/2013 1:02 PM, Todd Hunt wrote:


It seems like Solr forces one to expose access to its "Cores" (indexes) via its 
own WAR file.  I don't want that.  I just want to be able to utilize the Solr Java API to 
integrate with our current web services and Hibernate framework to index text based 
documents.  Then allow our users to perform open text searching and utilize Solr's 
advance features like highlighting, MLT, spell checking, suggester and faceting.  But I 
just don't see how to integrate what Solr has to offer with our existing web application. 
 I get the feeling that I have to create a new Solr based web application and then have 
the current application delegate indexing and searching to the Solr application, which is 
not what I really want to do, if possible.


I've removed most of your email and just quoted the one paragraph above. 
 You have pretty much described the right way to use Solr.  Solr is 
awesome for new projects, because the amount of user code required to 
interface with Solr is usually very small.  Most of the heavy lifting is 
done server-side, in the configuration.


People like yourself that are highly experienced with custom Lucene 
applications often find Solr too restrictive.  Solr does provide extra 
functionality on top of Lucene, but it does NOT expose all of Lucene's 
capability, especially in the newest versions.


Migrating from Lucene to Solr isn't for everyone.  If you have a deep 
understanding of Lucene and your existing application is intricately 
tied to it, you should probably stick with Lucene and just upgrade to 
the newest stable release, because chances are that the way Solr uses 
Lucene is not completely compatible with your existing methods.  From 
what I've been told, the upgrade from Lucene 3.x to 4.x does require a 
lot of refactoring work on user code.


If you do decide to implement Solr, the recommendation is to use the 
.war and make connections from client code with HttpSolrServer or 
CloudSolrServer.  Although you CAN use EmbeddedSolrServer to embed the 
entire Solr application in your program and avoid HTTP, this is not 
recommended, and it doesn't do anything to change the fact that your 
Lucene code may be fundamentally different than Solr.  To completely 
duplicate your Lucene application you might have to write custom Solr 
components ... and if you start doing that, you might as well simply 
maintain your existing code through version upgrades.  Lucene is not 
going away, and a given version of Lucene will likely always have 
functionality beyond the same version of Solr.


Thanks,
Shawn

Inaccurate wiki documentation?

2013-05-20 Thread Shane Perry

I am in the process of setting up a core using Solr 4.3.  On the Core
Discovery
wiki
page it states:

As of SOLR-4196, there's a new way of defining cores. Essentially, it is no
longer necessary to define cores in solr.xml. In fact, solr.xml is no
longer necessary at all and will be obsoleted in Solr 5.x. As of Solr 4.3
the process is as follows:


   - If a solr.xml file is found in , then it is expected to be
  the old-style solr.xml that defines cores etc.
  - If there is no solr.xml but there is a solr.properties file, then
  exploration-based core enumeration is assumed.
  - If neither a solr.xml nor an solr.properties file is found, a
  default solr.xml file is assumed. NOTE: as of 5.0, this will not be true
  and an error will be thrown if no solr.properties file is found.

Using the 4.3 war available for download, I attempted to set up my core
using the solr.properties file (in anticipation of moving to 5.0).  When I
start the context, logging shows that the process is falling back to the
default solr.xml file (essentially the second bullet does not occur).
 After digging through the 4_3 branch it looks like solr.properties is not
yet part of the library.  Am I missing something (I'm able to get the
context started using a solr.xml file with "" as the contents)?

I'm going with a basic solr.xml for now, but any insight would be
appreciated.

Thanks in advance.

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-20 Thread Rishi Easwaran

We use commodity H/W which we procured over the years as our complex grew.
Running on jdk6 with tomcat 5. (Planning to upgrade to jdk7 and tomcat7 soon).
We run them with about 4GB heap. Using CMS GC. 


 

 

 

-Original Message-
From: adityab 
To: solr-user 
Sent: Sat, May 18, 2013 10:37 am
Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results.


These numbers are really great. Would you mind sharing your h/w configuration
and JVM params

thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrading-from-SOLR-3-5-to-4-2-1-Results-tp4064266p4064370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Existing Project using Hibernate, Spring and Lucene and Looking to Add Solr

2013-05-20 Thread Todd Hunt

Hi,

We have an existing Java based enterprise application that is bundled as a WAR 
file and runs on Tomcat and uses Spring 3.0.5, Hibernate 3.6.2, and Lucene 
3.0.3.  We are using annotations in Hibernate that nicely couple it Lucene to 
index objects (documents, images, PDFs, etc.) based on key value pairs.  We use 
Hibernate Search to retrieve the results were are looking for.

We want to extend our indexing capability to use Tika to extract text and 
metadata out of documents that are uploaded to the server and index that 
content.

When I initially read about Solr I saw that it would provide extra 
functionality on top of Lucene.  I was eager to get it integrated with our 
application.  But now that I have fully read "Apache Solr 3 Enterprise Search 
Server" I feel that my initial impressions of Solr were wrong.

I saw where Solr talked about using web services to upload files for indexing 
and also to perform searching and download content.  I thought that was just a 
nice feature that was available.  But I was not interested in that due to the 
fact that our application already has a web service interface that is used by 
our own home grown client application that communicates with the enterprise 
application above.

I've read about SolrJ / Solr Cell, EmebbedSolrServer, BackendQueueProcessor, 
and DIH and researched them on the web.  But none of them have provided me with 
the information to take a Hibernate managed object, inside of a transaction, 
persist the binary data in the database (which we are already doing), extra the 
text / contents from the binary file via Tika (which is a separate issue for a 
separate thread), and index that text with either Java API code or Java 
Annotations.

It seems like Solr forces one to expose access to its "Cores" (indexes) via its 
own WAR file.  I don't want that.  I just want to be able to utilize the Solr 
Java API to integrate with our current web services and Hibernate framework to 
index text based documents.  Then allow our users to perform open text 
searching and utilize Solr's advance features like highlighting, MLT, spell 
checking, suggester and faceting.  But I just don't see how to integrate what 
Solr has to offer with our existing web application.  I get the feeling that I 
have to create a new Solr based web application and then have the current 
application delegate indexing and searching to the Solr application, which is 
not what I really want to do, if possible.

I've looked through the Solr Java Docs and I haven't found anything substantial 
that would allow for me to just use Java code instead of creating HTTP 
connections to index and search for data.  Will someone let me know if what I 
am looking for is out of the scope of Solr's functionality or if there is a 
way, please provide an example of how I can accomplish this?

Thank you,

Todd

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-20 Thread Rishi Easwaran

Sure Shalin, hopefully soon.
 

 

 

-Original Message-
From: Shalin Shekhar Mangar 
To: solr-user 
Sent: Sat, May 18, 2013 11:35 pm
Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results.


Awesome news Rishi! Looking forward to your SolrCloud updates.


On Sat, May 18, 2013 at 12:59 AM, Rishi Easwaran wrote:

>
>
> Hi All,
>
> Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured
> I'd share some good news.
> I work for AOL mail team and we use SOLR for our mail search backend.
> We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
> community.
> We deal with millions indexes and billions of requests a day across our
> complex.
> We finished full rollout of SOLR 4.2.1 into our production last week.
>
> Some key highlights:
> - ~75% Reduction in Search response times
> - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
> Reduction in errors
> - Garbage collection total stop reduction by over 50% moving application
> throughput into the 99.8% - 99.9% range
> - ~15% reduction in CPU usage
>
> We did not tune our application moving from 3.5 to 4.2.1 nor update java.
> For the most part it was a binary upgrade, with patches for our special
> use case.
>
> Now going forward we are looking at prototyping SOLR Cloud for our search
> system, upgrade java and tomcat, tune our application further. Lots of fun
> stuff :)
>
> Have a great weekend everyone.
> Thanks,
>
> Rishi.
>
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: solr.xml or its successor in the wiki

2013-05-20 Thread Benson Margulies

I suppose you saw my JIRA suggesting that solr.xml should might have
the same repetoire of 'lib' elements as solrconfig.xml, instead of
just a single 'str'.

On Mon, May 20, 2013 at 11:16 AM, Erick Erickson
 wrote:
> What's supposed to happen (not guaranteeing it is completely correct,
> mind you) is that the presence of a  tag defines which checks
> are performed. Errors are thrown on old-style constructs when no
>  tag is present and vice-versa.
>
> Best
> Erick
>
>
> On Sun, May 19, 2013 at 7:20 PM, Benson Margulies  
> wrote:
>> One point of confusion: Is the compatibility code I hit trying to
>> prohibit the 'str' form when it sees old-fangled cores? Or when the
>> current running version pre-5.0? I hope it's the former.
>>
>> On Sun, May 19, 2013 at 6:47 PM, Shawn Heisey  wrote:
>>> On 5/19/2013 4:38 PM, Benson Margulies wrote:
 Shawn, thanks. need any more jiras on this?
>>>
>>> I don't think so, but if you grab the 4.3 branch or branch_4x and find
>>> any bugs, let us know.
>>>
>>> Thanks,
>>> Shawn
>>>

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jack Krupansky

Okay. I should have realized from the original email. The input is 
XML-encoded HTML. That's fine for a stored field that will be retrieved and 
then displayed in a browser, but is NOT searchable.


What you will have to do is maintain two copies of that data, one stored in 
HTML (the one your provided) for display only, not query, and a copy that is 
stripped of HTML, which should also convert the entity codes to proper 
Unicode accented character.


One approach:

1. Put the original text (HTML with entities for accented characters) in a 
field named "features_html". This would be a stored="true" indexed="false" 
field.

2. Add a copyField from "features_html" to "features".
3. Add an HTML strip char filter to the index analyzer for "features".



See:
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.html

4. Make features stored="false" indexed="true".

Or, your input could contain both features_html and features and your 
indexing client would strip the HTML tags and expand the entities for the 
accented characters. And then you can return features for clean text with 
accents.


Do you really want the HTML in Solr at all? For rich display it is 
reasonable, but is that your requirement?


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Monday, May 20, 2013 1:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Not able to search Spanish word with ascent in solr

On 5/20/2013 11:24 AM, jignesh wrote:

01name="a">716509name="aacute">384
name="as">26017695name="azul">6761name="and">60
name="acute">53


Solr is indexing the encoded XML - so you are getting amp, acute,
aacute, and similar terms in your index.

Looking at the XML that you are indexing, it doesn't contain XML encoded
accented characters.  It contains XML encoding of HTML encoding.  As a
specific example, your XML file contains this:

é

The correct way to encode this would be the following:

é

There is a problem with this, however.  This is HTML encoding, not XML
encoding.  This fails when you try to index it in Solr:

Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general
entity "eacute"

If I put the accented character right in the XML without the XML or HTML
encoding, it works correctly.

Thanks,
Shawn

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Shawn Heisey


On 5/20/2013 11:24 AM, jignesh wrote:

017165093842601769567616053


Solr is indexing the encoded XML - so you are getting amp, acute, 
aacute, and similar terms in your index.


Looking at the XML that you are indexing, it doesn't contain XML encoded 
accented characters.  It contains XML encoding of HTML encoding.  As a 
specific example, your XML file contains this:


é

The correct way to encode this would be the following:

é

There is a problem with this, however.  This is HTML encoding, not XML 
encoding.  This fails when you try to index it in Solr:


Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general 
entity "eacute"


If I put the accented character right in the XML without the XML or HTML 
encoding, it works correctly.


Thanks,
Shawn

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-20 Thread Noureddine Bouhlel

Hi Rishi,

Have you done any tests with Solr 4.3 ?

Regards,


Cordialement,

BOUHLEL Noureddine



On 17 May 2013 21:29, Rishi Easwaran  wrote:

>
>
> Hi All,
>
> Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured
> I'd share some good news.
> I work for AOL mail team and we use SOLR for our mail search backend.
> We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
> community.
> We deal with millions indexes and billions of requests a day across our
> complex.
> We finished full rollout of SOLR 4.2.1 into our production last week.
>
> Some key highlights:
> - ~75% Reduction in Search response times
> - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
> Reduction in errors
> - Garbage collection total stop reduction by over 50% moving application
> throughput into the 99.8% - 99.9% range
> - ~15% reduction in CPU usage
>
> We did not tune our application moving from 3.5 to 4.2.1 nor update java.
> For the most part it was a binary upgrade, with patches for our special
> use case.
>
> Now going forward we are looking at prototyping SOLR Cloud for our search
> system, upgrade java and tomcat, tune our application further. Lots of fun
> stuff :)
>
> Have a great weekend everyone.
> Thanks,
>
> Rishi.
>
>
>
>
>

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jack Krupansky

We can conclude that the accents did not get indexed and we know from the 
other experiment that the field type analyzer is not at fault.


1. How are you indexing the data? Verify what character encoding it is 
using.
2. Try manually indexing some accented data, like with a curl command, and 
see if the accents are missing.


curl http://localhost:8983/solr/update?commit=true -H 
'Content-type:application/xml' -d '


 
   doc-1
   Hola Mañana en le Café, habla el Académie 
française!

 
'

The above worked for me with the standard Solr 4.3 example schema.

-- Jack Krupansky

-Original Message- 
From: jignesh

Sent: Monday, May 20, 2013 1:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Not able to search Spanish word with ascent in solr

Here is the output using

/solr/terms?terms.fl=name&terms.prefix=a

--
017165093842601769567616053


What should I conclude from above?

Thanks
Waiting for reply



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064703.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-20 Thread Bryan Loofbourrow

My guess is that the problem is those 200M documents.
FastVectorHighlighter is fast at deciding whether a match, especially a
phrase, appears in a document, but it still starts out by walking the
entire list of term vectors, and ends by breaking the document into
candidate-snippet fragments, both processes that are proportional to the
length of the document.

It's hard to do much about the first, but for the second you could choose
to expose FastVectorHighlighter's FieldPhraseList representation, and
return offsets to the caller rather than fragments, building up your own
snippets from a separate store of indexed files. This would also permit
you to set stored="false", improving your memory/core size ratio, which
I'm guessing could use some improving. It would require some work, and it
would require you to store a representation of what was indexed outside
the Solr core, in some constant-bytes-to-character representation that you
can use offsets with (e.g. UTF-16, or ASCII+entity references).

However, you may not need to do this -- it may be that you just need more
memory for your search machine. Not JVM memory, but memory that the O/S
can use as a file cache. What do you have now? That is, how much memory do
you have that is not used by the JVM or other apps, and how big is your
Solr core?

One way to start getting a handle on where time is being spent is to set
up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight
queries, and look at where the time is being spent. If it's mostly in
methods that are just reading from disk, buy more memory. If you're on
Linux, look at what top is telling you. If the CPU usage is low and the
"wa" number is above 1% more often than not, buy more memory (I don't know
why that wa number makes sense, I just know that it has been a good rule
of thumb for us).

-- Bryan

> -Original Message-
> From: Andy Brown [mailto:andy_br...@rhoworld.com]
> Sent: Monday, May 20, 2013 9:53 AM
> To: solr-user@lucene.apache.org
> Subject: Slow Highlighter Performance Even Using FastVectorHighlighter
>
> I'm providing a search feature in a web app that searches for documents
> that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
> etc). Currently there are about 3000 documents and this will continue to
> grow. I'm providing full word search and partial word search. For each
> document, there are three source fields that I'm interested in searching
> and highlighting on: name, description, and content. Since I'm providing
> both full and partial word search, I've created additional fields that
> get tokenized differently: name_par, description_par, and content_par.
> Those are indexed and stored as well for querying and highlighting. As
> suggested in the Solr wiki, I've got two catch all fields text and
> text_par for faster querying.
>
> An average search results page displays 25 results and I provide paging.
> I'm just returning the doc ID in my Solr search results and response
> times have been quite good (1 to 10 ms). The problem in performance
> occurs when I turn on highlighting. I'm already using the
> FastVectorHighlighter and depending on the query, it has taken as long
> as 15 seconds to get the highlight snippets. However, this isn't always
> the case. Certain query terms result in 1 sec or less response time. In
> any case, 15 seconds is way too long.
>
> I'm fairly new to Solr but I've spent days coming up with what I've got
> so far. Feel free to correct any misconceptions I have. Can anyone
> advise me on what I'm doing wrong or offer a better way to setup my core
> to improve highlighting performance?
>
> A typical query would look like:
> /select?q=foo&start=0&rows=25&fl=id&hl=true
>
> I'm using Solr 4.1. Below the relevant core schema and config details:
>
> 
> 
>  required="true" multiValued="false"/>
>
>
> 
>  multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  stored="true" multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  multiValued="true"/>
>
> 
>  stored="true" multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  stored="true" multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  stored="true" multiValued="true" termPositions="true" termVectors="true"
> termOffsets="true"/>
>  stored="false" multiValued="true"/>
>
>
> 
> 
> 
> 
>
> 
> 
> 
> 
>
> 
> 
> 
> 
>
> 
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
>
>  
>
> 
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
>maxGramSize="7"/>
>   
>   
> 
>  words="stopwords.txt" enablePositionIncr

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jack Krupansky

We can conclude that the field type analyzer is NOT the problem. Good 
experiment to eliminate one culprit.


-- Jack Krupansky

-Original Message- 
From: jignesh

Sent: Monday, May 20, 2013 1:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Not able to search Spanish word with ascent in solr

Hello

Here is the output of Solr Admin UI Analysis page

http://awesomescreenshot.com/0ff1ao7347

What should I conclude from this?

Thanks,
Waiting for reply.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064700.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple cache for same field

2013-05-20 Thread Jason Hellman

Most definitely not the number of unique elements in each segment.  My 32 
document sample index (built from the default example docs data) has the 
following:

entry#0:
'StandardDirectoryReader(segments_b:29 _8(4.2.1):C32)'=>'manu_exact',class 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1778857102

There is no chance for there to be 1.8 billion unique elements in that index.

On May 20, 2013, at 1:20 PM, Erick Erickson  wrote:

> Not sure, never had to worry about what they are..
> 
> On Mon, May 20, 2013 at 12:28 PM, J Mohamed Zahoor  wrote:
>> 
>> What is the number at the end?
>> is it the no of unique elements in each segment?
>> 
>> ./zahoor
>> 
>> 
>> On 20-May-2013, at 7:37 PM, Erick Erickson  wrote:
>> 
>>> Because the same field is split amongst a number of segments. If you
>>> look in the index directory, you should see files like _3fgm.* and
>>> _3ffm.*. Each such group represents one segment. The number of
>>> segments changes with merging etc.
>>> 
>>> Best
>>> Erick
>>> 
>>> On Mon, May 20, 2013 at 6:43 AM, J Mohamed Zahoor  wrote:
 Hi
 
 Why is that lucene field cache has multiple entries for the same field 
 S_24.
 It is a dynamic field.
 
 
 'SegmentCoreReader(owner=_3fgm(4.2.1):C7681)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1174240382
 
 'SegmentCoreReader(owner=_3ffm(4.2.1):C1596758)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#83384344
 
 'SegmentCoreReader(owner=_3fgh(4.2.1):C2301)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1281331764
 
 
 Also, the number at the end.. does it specified the no of entries in that 
 cache bucket?
 
 ./zahoor
>>

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread jignesh

Here is the output using 

/solr/terms?terms.fl=name&terms.prefix=a 

--
017165093842601769567616053


What should I conclude from above?

Thanks
Waiting for reply



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread jignesh

Hello

Here is the output of Solr Admin UI Analysis page 

http://awesomescreenshot.com/0ff1ao7347

What should I conclude from this?

Thanks, 
Waiting for reply.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064700.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple cache for same field

2013-05-20 Thread Erick Erickson

Not sure, never had to worry about what they are..

On Mon, May 20, 2013 at 12:28 PM, J Mohamed Zahoor  wrote:
>
> What is the number at the end?
> is it the no of unique elements in each segment?
>
> ./zahoor
>
>
> On 20-May-2013, at 7:37 PM, Erick Erickson  wrote:
>
>> Because the same field is split amongst a number of segments. If you
>> look in the index directory, you should see files like _3fgm.* and
>> _3ffm.*. Each such group represents one segment. The number of
>> segments changes with merging etc.
>>
>> Best
>> Erick
>>
>> On Mon, May 20, 2013 at 6:43 AM, J Mohamed Zahoor  wrote:
>>> Hi
>>>
>>> Why is that lucene field cache has multiple entries for the same field S_24.
>>> It is a dynamic field.
>>>
>>>
>>> 'SegmentCoreReader(owner=_3fgm(4.2.1):C7681)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1174240382
>>>
>>> 'SegmentCoreReader(owner=_3ffm(4.2.1):C1596758)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#83384344
>>>
>>> 'SegmentCoreReader(owner=_3fgh(4.2.1):C2301)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1281331764
>>>
>>>
>>> Also, the number at the end.. does it specified the no of entries in that 
>>> cache bucket?
>>>
>>> ./zahoor
>

Re: solr UI logging when using logback?

2013-05-20 Thread Shawn Heisey


On 5/20/2013 10:44 AM, Boogie Shafer wrote:

BUT i havent figured out what i need to do to get the logging events to
display in the SOLR admin ui

e.g. at http://solr-hostname:8983/solr/#/~logging


The logging page in the UI is populated by log watcher classes specific 
to the logging implementation.  Prior to 4.3, the only watcher available 
in released Solr versions was the one for java.util.logging.  The log4j 
watcher was incorporated in the 4.3.0 release.  I have been using log4j 
since 4.1-SNAPSHOT, but I don't yet have any 4.3 servers in production, 
so I can't get logs in my UI.


To get log events in the UI with logback, you would need to implement a 
watcher specifically for logback.  I don't think this is a high priority 
item for the project at the moment, but patches are welcome.



AND
i'm wondering if its possible to get the jetty start log managed under
logback


On my setup using the jetty included with Solr and the slf4j/log4j jars 
in lib/ext, all jetty log entries are logged to the same file as my Solr 
logs, according to my log4j.properties file.


If you have any logging config for jetty itself, then that will be used. 
 The easiest way to proceed is to simply comment or remove that logging 
config.  That will cause jetty to find slf4j in the classpath and use 
it, which you have already configured to use logback.  The example jetty 
config does not have any logging configured.


Thanks,
Shawn

Re: Deleting an entry from a collection when they key has ":" in it

2013-05-20 Thread Chris Hostetter


: Technically, core Solr does not require a unique key. A lot of features in

nohting in this thread refered to the uniqueKey field, or the lack of a 
uniqueKey field in the users schema, at all until you brought it up.

 * the user has a field named "key"
 * the user had a question about deleting by query where the 
   queries involved the field named "key"
 * the user stated that the field named "key" is not indexed...

: If in my schema, I have the "key" field set to indexed=false, then is that
: maybe the issue?  I'm going to try to set that to true and rebuild the
: repository and see if that does it.


..cut and dry, no need to confuse the issue more.


-Hoss

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jason Hellman

And use the /terms request handler to view what is present in the field:

/solr/terms?terms.fl=text_es&terms.prefix=a

You're looking to ensure the index does, in fact, have the accented characters 
present.  It's just a sanity check, but could possibly save you a little 
(sanity, that is).

Jason

On May 20, 2013, at 12:51 PM, "Jack Krupansky"  wrote:

> Try the Solr Admin UI Analysis page - enter text for both index and query for 
> your field and see whether the final terms still have their accents.
> 
> -- Jack Krupansky
> 
> -Original Message- From: jignesh
> Sent: Monday, May 20, 2013 10:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Not able to search Spanish word with ascent in solr
> 
> Thanks for the reply..
> 
> I am send below type of xml to solr
> 
> 
> 15
> 15
> Mis nuevos colgantes de PRIMARK
> ¿Alguna vez os habéis pasado
> por la zona de bisutería de PRIMARK? Cada vez que me doy una
> vuelta y paso por delante no puedo evitar echar un vistazo a ver si
> encuentro algún detallito mono. Colgantes, pendientes, pulseras,
> diademastienen de todo y siempre está bien de precio.
> Hoy quería enseñaros mis dos últimas
> compras: dos colgantes, uno con forma de búho y otro con un robot
> fashion. Y lo mejor es que sólo me he gastado 5 euros.
> ¿Qué os parecen?
> ¿Habéis comprado alguna vez en esta tienda?
> 
> 
> 
> 
> 
> I am giving below url
> 
> http://localhost:8983/solr/select/?q=étnico&indent=on&qf=name&qf=features&defType=edismax&start=0&rows=50&wt=json
> 
> waiting for reply
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064651.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: seeing lots of "autowarming" messages in log during DIH indexing

2013-05-20 Thread geeky2

you mean i would add this switch to my script that kicks of the dataimport?

exmaple:


OUTPUT=$(curl -v
http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport
-F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
optimize=${OPTIMIZE} -F openSearcher=false)


what needs to be done _AFTER_ the DIH finishes (if anything)?

eg, does this need to be turned back on after the DIH has finished?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064695.html
Sent from the Solr - User mailing list archive at Nabble.com.

Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-20 Thread Andy Brown

I'm providing a search feature in a web app that searches for documents
that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
etc). Currently there are about 3000 documents and this will continue to
grow. I'm providing full word search and partial word search. For each
document, there are three source fields that I'm interested in searching
and highlighting on: name, description, and content. Since I'm providing
both full and partial word search, I've created additional fields that
get tokenized differently: name_par, description_par, and content_par.
Those are indexed and stored as well for querying and highlighting. As
suggested in the Solr wiki, I've got two catch all fields text and
text_par for faster querying. 
 
An average search results page displays 25 results and I provide paging.
I'm just returning the doc ID in my Solr search results and response
times have been quite good (1 to 10 ms). The problem in performance
occurs when I turn on highlighting. I'm already using the
FastVectorHighlighter and depending on the query, it has taken as long
as 15 seconds to get the highlight snippets. However, this isn't always
the case. Certain query terms result in 1 sec or less response time. In
any case, 15 seconds is way too long. 
 
I'm fairly new to Solr but I've spent days coming up with what I've got
so far. Feel free to correct any misconceptions I have. Can anyone
advise me on what I'm doing wrong or offer a better way to setup my core
to improve highlighting performance? 
 
A typical query would look like:
/select?q=foo&start=0&rows=25&fl=id&hl=true 
 
I'm using Solr 4.1. Below the relevant core schema and config details: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
   
   
 
 
 
 

  
 
 
 
   
 
 
 
 
   
   
 
 
 
 
   
 
 
 
 
 
 
  
   explicit 
   10 
   text 
   edismax 
   text^2 text_par^1
   true 
   true 
   true 
   true 
   breakIterator 
   2 
   name name_par description description_par
content content_par 
   162 
   simple 
   default 



   
  
  


Cheers!

- Andy

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jack Krupansky

Try the Solr Admin UI Analysis page - enter text for both index and query 
for your field and see whether the final terms still have their accents.


-- Jack Krupansky

-Original Message- 
From: jignesh

Sent: Monday, May 20, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Not able to search Spanish word with ascent in solr

Thanks for the reply..

I am send below type of xml to solr


15
15
Mis nuevos colgantes de PRIMARK
¿Alguna vez os habéis pasado
por la zona de bisutería de PRIMARK? Cada vez que me doy una
vuelta y paso por delante no puedo evitar echar un vistazo a ver si
encuentro algún detallito mono. Colgantes, pendientes, pulseras,
diademastienen de todo y siempre está bien de precio.
Hoy quería enseñaros mis dos últimas
compras: dos colgantes, uno con forma de búho y otro con un robot
fashion. Y lo mejor es que sólo me he gastado 5 euros.
¿Qué os parecen?
¿Habéis comprado alguna vez en esta tienda?





I am giving below url

http://localhost:8983/solr/select/?q=étnico&indent=on&qf=name&qf=features&defType=edismax&start=0&rows=50&wt=json

waiting for reply

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064651.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: clusterstate stores IP address instead of hostname now?

2013-05-20 Thread Shawn Heisey


On 5/20/2013 9:25 AM, Daniel Collins wrote:

Just done an upgrade from Solr (cloud) 4.0 to 4.3 and noticed that
clusterstate.json now contains the IP address instead of the hostname for
each shard descriptor.

Was this a conscious change?  It caused us some pain when migrating and
breaks our own admin tools, so just checking if this is likely to change
again or is stable now using IPs?


That changed in 4.1.  If you want real hostnames, include the host 
parameter on each Solr instance on your startup commandline 
(-Dhost=server1) or in solr.xml.  I think solr.xml is better, but do 
what works for you.


http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

Thanks,
Shawn

solr UI logging when using logback?

2013-05-20 Thread Boogie Shafer

i have logging working for the most part with logback 1.0.13 and slf4j
1.7.5 under solr 4.3.0 (or previously under solr 4.2.1)

with two exceptions, i'm very happy with the setup as i can get all the
jetty request logs, and various solr service events logged out with
rotation, etc

BUT i havent figured out what i need to do to get the logging events to
display in the SOLR admin ui

e.g. at http://solr-hostname:8983/solr/#/~logging


AND
i'm wondering if its possible to get the jetty start log managed under
logback


anybody have any pointers on these topics?

---

the configuration details of my setup are summarized in the rpm building
process here
https://github.com/boogieshafer/jetty-solr-rpm

Re: seeing lots of "autowarming" messages in log during DIH indexing

2013-05-20 Thread Shreejay

Every time a commit is done a new searcher is opened. In the solr config file 
caches are defined with a parameter called autowarm. Autowarm basically tries 
to copy the cache values from previous searcher into the current one. If you 
are doing a bulk update and do not care for searching till your indexing is 
over then you can specify openSearcher=false while doing a commit. That should 
speed up your indexing a lot.  

-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


On Monday, May 20, 2013 at 7:16, geeky2 wrote:

> hello,
> 
> we are tracking down some performance issues with our DIH process.
> 
> not sure if this is related - but i am seeing tons of the messages below in
> the logs during re-indexing of the core.
> 
> what do these messages mean?
> 
> 
> 2013-05-18 19:37:30,623 INFO [org.apache.solr.update.UpdateHandler]
> (pool-11-thread-1) end_commit_flush
> 2013-05-18 19:37:30,623 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
> main
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming result for Searcher@5b8d745 main
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
> main
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming result for Searcher@5b8d745 main
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
> main
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming result for Searcher@5b8d745 main
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=3,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
> main
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher]
> (pool-10-thread-1) autowarming result for Searcher@5b8d745 main
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> 
> thx
> mark
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread jignesh

Thanks for the reply..

I am send below type of xml to solr


15
15
Mis nuevos colgantes de PRIMARK
¿Alguna vez os habéis pasado
por la zona de bisutería de PRIMARK? Cada vez que me doy una
vuelta y paso por delante no puedo evitar echar un vistazo a ver si
encuentro algún detallito mono. Colgantes, pendientes, pulseras,
diademastienen de todo y siempre está bien de precio.
Hoy quería enseñaros mis dos últimas
compras: dos colgantes, uno con forma de búho y otro con un robot
fashion. Y lo mejor es que sólo me he gastado 5 euros.
¿Qué os parecen?
¿Habéis comprado alguna vez en esta tienda?





I am giving below url

http://localhost:8983/solr/select/?q=étnico&indent=on&qf=name&qf=features&defType=edismax&start=0&rows=50&wt=json

waiting for reply

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064651.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread jignesh

Thanks for your reply

I am using jetty for solr search



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4064652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple cache for same field

2013-05-20 Thread J Mohamed Zahoor


What is the number at the end?
is it the no of unique elements in each segment?

./zahoor


On 20-May-2013, at 7:37 PM, Erick Erickson  wrote:

> Because the same field is split amongst a number of segments. If you
> look in the index directory, you should see files like _3fgm.* and
> _3ffm.*. Each such group represents one segment. The number of
> segments changes with merging etc.
> 
> Best
> Erick
> 
> On Mon, May 20, 2013 at 6:43 AM, J Mohamed Zahoor  wrote:
>> Hi
>> 
>> Why is that lucene field cache has multiple entries for the same field S_24.
>> It is a dynamic field.
>> 
>> 
>> 'SegmentCoreReader(owner=_3fgm(4.2.1):C7681)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1174240382
>> 
>> 'SegmentCoreReader(owner=_3ffm(4.2.1):C1596758)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#83384344
>> 
>> 'SegmentCoreReader(owner=_3fgh(4.2.1):C2301)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1281331764
>> 
>> 
>> Also, the number at the end.. does it specified the no of entries in that 
>> cache bucket?
>> 
>> ./zahoor

Re: HttpClient version

2013-05-20 Thread Michael Della Bitta

We've run into this problem when deploying index jobs that run in Elastic
Mapreduce. We've gotten by with an older version of SolrJ, but some of the
fixes and enhancements with SolrCloud that came out in the 4.x series
aren't available if you go back to an earlier version.

In particular, we're running 4.2.1 and we don't have the ability to call
updateAliases on the ZkStateReader to get around this bug:
https://issues.apache.org/jira/browse/SOLR-4664

We've managed to get by so far, however.



Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

The science of influence marketing.


On Fri, May 17, 2013 at 7:23 PM, Shawn Heisey  wrote:

> On 5/17/2013 5:06 PM, Jamie Johnson wrote:
> > I am trying to use Solr inside of another framework (Storm) that
> provides a
> > version of HttpClient (4.1.x) that is incompatible with the latest
> version
> > that SolrJ requires (4.3.x).  Is there a way to use the older version of
> > HttpClient with SolrJ?  Are there any issues with using an earlier SolrJ
> > (4.0.0) that used HttpClient 4.1.x with 4.3 of the server?  I'm really
> just
> > looking for options for running Solr in Storm, so any thoughts would be
> > greatly appreciated.
>
> An older SolrJ will probably work fine.  SolrJ is pretty stable,
> overall.  If you're using SolrCloud, you'll probably want to use a later
> version just to be sure everything works right.
>
> One thing that I would try first is removing jars from Storm that
> conflict with SolrJ and provide the upgraded versions in a common lib
> directory in your classpath.  There's a reasonable chance that Storm
> will work just fine with the newer jars.
>
> For my own SolrJ app, I download the newest versions of the jars instead
> of using the older versions included with SolrJ.
>
> http://commons.apache.org/proper/commons-io/download_io.cgi
> http://hc.apache.org/downloads.cgi
> http://slf4j.org/download.html
> http://logging.apache.org/log4j/1.2/download.html
>
> Thanks,
> Shawn
>
>

Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup="true" and transient="true"

2013-05-20 Thread Erick Erickson

Lyuba:

Could you go ahead and raise a JIRA and assign it to me to
investigate? You should definitely be able to define cores this way.

Thanks,
Erick

On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk
 wrote:
> Hi,
>
> It seems like in order to query transient cores they must be defined with
> loadOnStartup="false".
>
> I define one core loadOnStartup="true" and transient="false", and another
> cores to be  loadOnStartup="true" and transient="true", and
> transientCacheSize=Integer.MAX_VALUE.
>
> In this case CoreContainer.dynamicDescriptors will be empty and then
> CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String)
> returns null for all transient cores.
>
> I looked at the code of 4.3.0 and it doesn't seem that the flow was
> changed, the core is added only if it's not loaded on start up.
>
> Could you please assist with this issue?
>
> Best regards,
> Lyuba

Re: solr.xml or its successor in the wiki

2013-05-20 Thread Erick Erickson

What's supposed to happen (not guaranteeing it is completely correct,
mind you) is that the presence of a  tag defines which checks
are performed. Errors are thrown on old-style constructs when no
 tag is present and vice-versa.

Best
Erick


On Sun, May 19, 2013 at 7:20 PM, Benson Margulies  wrote:
> One point of confusion: Is the compatibility code I hit trying to
> prohibit the 'str' form when it sees old-fangled cores? Or when the
> current running version pre-5.0? I hope it's the former.
>
> On Sun, May 19, 2013 at 6:47 PM, Shawn Heisey  wrote:
>> On 5/19/2013 4:38 PM, Benson Margulies wrote:
>>> Shawn, thanks. need any more jiras on this?
>>
>> I don't think so, but if you grab the 4.3 branch or branch_4x and find
>> any bugs, let us know.
>>
>> Thanks,
>> Shawn
>>

clusterstate stores IP address instead of hostname now?

2013-05-20 Thread Daniel Collins

Just done an upgrade from Solr (cloud) 4.0 to 4.3 and noticed that
clusterstate.json now contains the IP address instead of the hostname for
each shard descriptor.

Was this a conscious change?  It caused us some pain when migrating and
breaks our own admin tools, so just checking if this is likely to change
again or is stable now using IPs?

Cheers,

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

2013-05-20 Thread vsilgalis

I didn't change it and haven't seen any issues.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Zookeeper-Ensemble-Startup-Parameters-For-SolrCloud-tp4063905p4064654.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-20 Thread Erick Erickson

Rishi:

Thanks very much for taking the time to post this, we're always
looking for before/after numbers!

Erick

On Sat, May 18, 2013 at 11:34 PM, Shalin Shekhar Mangar
 wrote:
> Awesome news Rishi! Looking forward to your SolrCloud updates.
>
>
> On Sat, May 18, 2013 at 12:59 AM, Rishi Easwaran 
> wrote:
>
>>
>>
>> Hi All,
>>
>> Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured
>> I'd share some good news.
>> I work for AOL mail team and we use SOLR for our mail search backend.
>> We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
>> community.
>> We deal with millions indexes and billions of requests a day across our
>> complex.
>> We finished full rollout of SOLR 4.2.1 into our production last week.
>>
>> Some key highlights:
>> - ~75% Reduction in Search response times
>> - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
>> Reduction in errors
>> - Garbage collection total stop reduction by over 50% moving application
>> throughput into the 99.8% - 99.9% range
>> - ~15% reduction in CPU usage
>>
>> We did not tune our application moving from 3.5 to 4.2.1 nor update java.
>> For the most part it was a binary upgrade, with patches for our special
>> use case.
>>
>> Now going forward we are looking at prototyping SOLR Cloud for our search
>> system, upgrade java and tomcat, tune our application further. Lots of fun
>> stuff :)
>>
>> Have a great weekend everyone.
>> Thanks,
>>
>> Rishi.
>>
>>
>>
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: xPath XML-Import

2013-05-20 Thread Erick Erickson

This is really just parsing the XML using any of several parsers and
putting the results into a SolrInputDocument (assuming a SolrJ
client).

Alternatively, you could perhaps do some XSLT transformations, but I'm
not great on the ins and outs of XSLT...

Best
Erick

On Sun, May 19, 2013 at 11:03 AM, Benjamin Kern  wrote:
> Hi all,
>
> I´m reading XML-Files into Solr. I have the following structure:
>
> 
>
> 
>
> AA
>
> 123456789
>
> 
>
> 
>
> BB
>
> 987654321
>
> 
>
> 
>
> CC
>
> 147258369
>
> 
>
> …
>
> 
>
>
>
> How can I read the name with value. The Solr result should look like this.
>
> 123456789
>
> 987654321
>
> 147258369
>
>
>
>
>
> Any Idea?
>
>
>
> Thanks
>
>
>

Re: Adding a field in schema , storing it and use it to search

2013-05-20 Thread Erick Erickson

Whether you add it as a dynamic field or "normal" field really doesn't
matter from a Solr perspective. Dynamic fields are exactly like normal
fields, you just don't have to fully specify the name. That said, I
prefer normal fields to prevent typo's from messing me up. If you had
a dynamic field like you indicate then entered esp_55 rather than
exp_55, you'd have to track down why things were wrong, whereas if you
used a normal field, the doc would fail to index and you'd know early.
"fail early" is good.

To search, something like experience:[5 TO *] would do. Note that
there's both inclusive and exclusive syntax here, distinguished by
curly braces or backets (i.e. {} or []) which may be mixed.

Best
Erick

On Thu, May 16, 2013 at 8:37 AM, Kamal Palei  wrote:
> Hi All
>
> Need help in adding a new field and making use of it during search.
>
> As of today I just search some keywords and whatever document (actually
> these are resumes of individuals) is retrieved from SOLR search I take
> these as input, then search in mysql for experience, salary etc and then
> selected resumes I show as search result.
>
> Say, while searching in SOLR, I want to achieve something as below.
>
> 1. Search keywords in those users resume whose experience is greater than 5
> years.
>
> To achieve My understanding is
> 1. I need to define a new field in schema
> 2. During indexing, add this parameter
> 3. During search, have a condition like experience >= 5 years
>
>
> When I will be adding a field , should I add as a normal field one as shown
> below
>
> **
>
> OR as a dynamic field as shown below
>
> * multiValued="false"/>*
>
>
> And during search, how the condition should look like.
>
> Best regards
> Kamal

Re: cache disable through solrJ

2013-05-20 Thread Shawn Heisey

On 5/20/2013 5:53 AM, J Mohamed Zahoor wrote:
> How do i disable cache (Solr FieldValueCache) for certain queries...
> using HTTP it can be done using {!cache=false}... 

If you are doing facets, Koji's reply works for those.

The localparam for caching should work just fine if you prepend it to
your query string before you add it to your query object in SolrJ.

query.addFilterQuery("{!cache=false}instock:true");

If this is what you have tried, be sure that you don't use
ClientUtils.escapeQueryChars to escape the prepended localparam, or it
will become part of your query text rather than the special cache
instruction.  You can run it on the query parts, of course.

If it still doesn't work, check the Solr log for what parameters get
sent with the http query and with the solrj query.

Thanks,
Shawn

Re: Compatible collections SOLR4 / SOLRCloud?

2013-05-20 Thread Erick Erickson

The latter, the schemas must be similar enough to satisfy the query

Best
Erick

On Thu, May 16, 2013 at 5:03 AM, Marcin  wrote:
> Hi there,
>
> I am trying to figure out what SOLR means by compatible collection in order
> to be able to run the following query:
>
> Query all shards of multiple compatible collections, explicitly specified:
>
> http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT
>
> Does this mean that the schema.xml must be exactly same between those
> collections or just partially same (share same fields used to satisfy the
> query)?
>
> cheers,
> /Marcin

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Shawn Heisey

On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> And I do remember the discussion on the forum about dropping the name
> *apache* from solr jars. If that's what caused this issue, then can you
> tell me if the mirrors need updating with solr-core.jar instead of
> apache-solr-core.jar?

If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
named solr-core, then it's from 4.1 or later.  That might mean that you
are mixing versions - don't do that.  Make sure that you have jars from
the exact same version as your server.

Thanks,
Shawn

seeing lots of "autowarming" messages in log during DIH indexing

2013-05-20 Thread geeky2

hello,

we are tracking down some performance issues with our DIH process.

not sure if this is related - but i am seeing tons of the messages below in
the logs during re-indexing of the core.

what do these messages mean?


2013-05-18 19:37:30,623 INFO  [org.apache.solr.update.UpdateHandler]
(pool-11-thread-1) end_commit_flush
2013-05-18 19:37:30,623 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,624 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,624 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,625 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,625 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=3,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight only when all keywords match

2013-05-20 Thread Upayavira

I can't give you much advice on the topic. I have reviewed the
HighlightComponent and found it complex and hard to follow, so expect it
to be challenging.

Upayavira

On Mon, May 20, 2013, at 01:28 PM, Sandeep Mestry wrote:
> Thanks Upayavira for that valuable suggestion.
> 
> I believe overriding highlight component should be the way forward.
> Could you tell me if there is any existing example or which methods I
> should particularly override?
> 
> Thanks,
> Sandeep
> 
> 
> On 20 May 2013 12:47, Upayavira  wrote:
> 
> > If you are saying that you want to change highlighting behaviour, not
> > query behaviour, then I suspect you are going to have to interact with
> > the java HighlightComponent. If you can work out how to update that
> > component to behave as you wish, you could either subclass it, or create
> > your own implementation that you can include in your Solr setup. Or, if
> > you make it generic enough, offer it back as a contribution that can be
> > included in future Solr releases.
> >
> > Upayavira
> >
> > On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote:
> > > I doubt if that will be the correct approach as it will be hard to
> > > generate
> > > the query grammar considering we have support for phrase, operator,
> > > wildcard and group queries.
> > > That's why I have kept it simple and only passing the query text with
> > > minimal parsing (escaping lucene special characters) to configured
> > > edismax.
> > > The number of fields I have mentioned above are a lot lesser than the
> > > actual number of fields - around 50 in number :-). So forming such a long
> > > query will both be time and resource consuming. Further, it's not going
> > > to
> > > fulfill my requirement anyway because I do not want to change my search
> > > results, the requirement is only to provide a highlight if a field is
> > > matched for all the query terms.
> > >
> > > Thanks,
> > > Sandeep
> > >
> > >
> > > On 20 May 2013 12:02, Jaideep Dhok  wrote:
> > >
> > > > If you know all fields that need to be queried, you can rewrite it as -
> > > > (assuming, f1, f2 are the fields that you have to search)
> > > > (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
> > > >
> > > > -
> > > > Jaideep
> > > >
> > > >
> > > > On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> > > > wrote:
> > > >
> > > > > Hi Jaideep,
> > > > >
> > > > > The edismax config I have posted mentioned that the default operator
> > is
> > > > > AND. I am sorry if I was not clear in my previous mail, what I need
> > > > really
> > > > > is highlight a field when all search query terms present. The current
> > > > > highlighter works for *any* of the terms match and not for *all*
> > terms
> > > > > match.
> > > > >
> > > > > Thanks,
> > > > > Sandeep
> > > > >
> > > > >
> > > > > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> > > > >
> > > > > > Sandeep,
> > > > > > If you AND all keywords, that should be OK?
> > > > > >
> > > > > > Thanks
> > > > > > Jaideep
> > > > > >
> > > > > >
> > > > > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry <
> > sanmes...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear All,
> > > > > > >
> > > > > > > I have a requirement to highlight a field only when all keywords
> > > > > entered
> > > > > > > match. This also needs to support phrase, operator or wildcard
> > > > queries.
> > > > > > > I'm using Solr 4.0 with edismax because the search needs to be
> > > > carried
> > > > > > out
> > > > > > > on multiple fields.
> > > > > > > I know with highlighting feature I can configure a field to
> > indicate
> > > > a
> > > > > > > match, however I do not find a setting to highlight only if all
> > > > > keywords
> > > > > > > match. That makes me think is that the right approach to take?
> > Can
> > > > you
> > > > > > > please guide me in right direction?
> > > > > > >
> > > > > > > The edsimax config looks like below:
> > > > > > >
> > > > > > > 
> > > > > > > 
> > > > > > > edismax
> > > > > > > explicit
> > > > > > > 0.01
> > > > > > > title^10 description^5 annotations^3 notes^2
> > > > > > > categories
> > > > > > > title
> > > > > > > 0
> > > > > > > *:*
> > > > > > > *,score
> > > > > > > 100%
> > > > > > > AND
> > > > > > > score desc
> > > > > > > true
> > > > > > > -1
> > > > > > > 1
> > > > > > > uniq_subtype_id
> > > > > > > component_type
> > > > > > > genre_type
> > > > > > > 
> > > > > > > 
> > > > > > > collection:assets
> > > > > > > 
> > > > > > > 
> > > > > > >
> > > > > > > If I search for 'countryside number 10' as the keyword then
> > highlight
> > > > > > only
> > > > > > > if the 'annotations' contain all these entered search terms. Any
> > > > > document
> > > > > > > containing just one or two terms is not a match.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sandeep
> > > > > > > (p.s: I haven't enabled the highlighting feature yet on this
> > config
> > > > and
> > > > > > > will be doing so only if that will fulfil the requirement I have
> > > > > > men

Re: multiple cache for same field

2013-05-20 Thread Erick Erickson

Because the same field is split amongst a number of segments. If you
look in the index directory, you should see files like _3fgm.* and
_3ffm.*. Each such group represents one segment. The number of
segments changes with merging etc.

Best
Erick

On Mon, May 20, 2013 at 6:43 AM, J Mohamed Zahoor  wrote:
> Hi
>
> Why is that lucene field cache has multiple entries for the same field S_24.
> It is a dynamic field.
>
>
> 'SegmentCoreReader(owner=_3fgm(4.2.1):C7681)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1174240382
>
> 'SegmentCoreReader(owner=_3ffm(4.2.1):C1596758)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#83384344
>
> 'SegmentCoreReader(owner=_3fgh(4.2.1):C2301)'=>'S_24',double,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_DOUBLE_PARSER=>org.apache.lucene.search.FieldCacheImpl$DoublesFromArray#1281331764
>
>
> Also, the number at the end.. does it specified the no of entries in that 
> cache bucket?
>
> ./zahoor

Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Sandeep Mestry

Hi All,

I want to override a component from solr-core and for that I need solr-core
jar.

I am using the solr.war that comes from Apache mirror and if I open the
war, I see the solr-core jar is actually named as apache-solr-core.jar.
This is also true about solrj jar.

If I now provide a dependency in my module for apache-solr-core.jar, it's
not being found in the mirror. And if I use solr-core.jar, I get strange
class cast exception during Solr startup for MorfologikFilterFactory.

(I'm not using this factory at all in my project.)

at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.ClassCastException: class
org.apache.lucene.analysis.morfologik.MorfologikFilterFactory
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:126)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:73)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.(AnalysisSPILoader.java:55)

I tried manually removing the apache-solr-core.jar from the solr
distribution war and then providing the dependency and everything worked
fine.

And I do remember the discussion on the forum about dropping the name
*apache* from solr jars. If that's what caused this issue, then can you
tell me if the mirrors need updating with solr-core.jar instead of
apache-solr-core.jar?

Many Thanks,
Sandeep

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jack Krupansky

Tomcat is notorious for not defaulting to UTF-8 encoding for URLs which is 
how the query is passed, which is needed to preserve all these accented 
characters.


In Tomcat's server.xml, it should have something like:



The "URIEncoding="UTF-8"" is essential.

-- Jack Krupansky

-Original Message- 
From: jignesh

Sent: Saturday, May 18, 2013 1:53 PM
To: solr-user@lucene.apache.org
Subject: Not able to search Spanish word with ascent in solr

I have install solr 3.5
I would like to search words(Spanish words) like

-> enseñé
-> étnico
-> castaño
-> después

with ascent ñ,é etc.

But solr is not search such words from index.
I have used
-

   
 
   
   
   
   

 
   
- 
like :




But still not able to search Spanish word with ascent..

Please let me know if I am missing anything?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [custom data structure] aligned dynamic fields

2013-05-20 Thread Jack Krupansky

Before you dive off the deep end and "go crazy" with dynamic fields, try a 
clean, simple, Solr-oriented static design. Yes, you CAN do an 
over-complicated design with dynamic fields, but that doesn't mean you 
should.


In a single phrase, denormalize and flatten your design. Sure, that will 
lead to a lot of rows, but Solr and Lucene are designed to do well in that 
scenario.


If you are still linking in terms of "C Struct", go for a long walk or do 
SOMETHING else until you can get that idea out of your head. It is a 
sub-optimal approach for exploiting the power of Lucene and Solr.


Stay with a static schema design until you hit... just stay with a static 
schema, period.


Dynamic fields and multi-valued fields do have value, but only when used in 
moderation - small numbers. If you start down a design path and find that 
you are heavily dependent on dynamic fields and/or multi-valued fields with 
large numbers of values per document, that is feedback that your design 
needs to be denormalized and flattened further.


-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Monday, May 20, 2013 7:06 AM
To: solr-user@lucene.apache.org
Subject: [custom data structure] aligned dynamic fields

Hi all,

Our current project requirement suggests that we should start storing
custom data structures in solr index. The custom data structure would be an
equivalent of C struct.

The task is as follows.

Suppose we have two types of fields, one is FieldName1 and the other
FieldName2.

Suppose also that we can have multiple pairs of these two fields on a
document in Solr.

That is, in notation of dynamic fields:

doc1
FieldName1_id1
FieldName2_id1

FieldName1_id2
FieldName2_id2

doc2
FieldName1_id3
FieldName2_id3

FieldName1_id4
FieldName2_id4

FieldName1_id5
FieldName2_id5

etc

What we would like to have is a value for the Field1_(some_unique_id) and a
value for Field2_(some_unique_id) as input for search. That is we wouldn't
care about the some_unique_id in some search scenarios. And the search
would automatically iterate the pairs of dynamic fields and respect the
pairings.

I know it used to be so, that with dynamic fields a client must provide the
dynamically generated field names coupled with their values up front when
searching.

What data structure / solution could be used as an alternative approach to
help such a "structured search"?

Thanks,

Dmitry

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Gora Mohanty

On 18 May 2013 23:23, jignesh  wrote:
> I have install solr 3.5
> I would like to search words(Spanish words) like
>
> -> enseñé
> -> étnico
> -> castaño
> -> después
>
> with ascent ñ,é etc.
>
> But solr is not search such words from index.
[...]

Are you able to set up Solr, and search for English input?
What encoding are you using for documents that you are
sending to Solr? Are you properly encoding the query text?
What container are you using to run Solr. With UTF-8 input
for indexing and querying, and the built-in Jetty, searching for
characters with Spanish accents should just work.

Regards,
Gora

Re: cache disable through solrJ

2013-05-20 Thread Koji Sekiguchi


(13/05/20 20:53), J Mohamed Zahoor wrote:

Hi

How do i disable cache (Solr FieldValueCache) for certain queries...
using HTTP it can be done using {!cache=false}...

how can i do it from solrj?

./zahoor



How about using facet.method=enum?

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html

Not able to search Spanish word with ascent in solr

2013-05-20 Thread jignesh

I have install solr 3.5
I would like to search words(Spanish words) like

-> enseñé
-> étnico
-> castaño
-> después

with ascent ñ,é etc.

But solr is not search such words from index.
I have used
-
 

   





  

-   
like : 



But still not able to search Spanish word with ascent..

Please let me know if I am missing anything?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Thanks Upayavira for that valuable suggestion.

I believe overriding highlight component should be the way forward.
Could you tell me if there is any existing example or which methods I
should particularly override?

Thanks,
Sandeep


On 20 May 2013 12:47, Upayavira  wrote:

> If you are saying that you want to change highlighting behaviour, not
> query behaviour, then I suspect you are going to have to interact with
> the java HighlightComponent. If you can work out how to update that
> component to behave as you wish, you could either subclass it, or create
> your own implementation that you can include in your Solr setup. Or, if
> you make it generic enough, offer it back as a contribution that can be
> included in future Solr releases.
>
> Upayavira
>
> On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote:
> > I doubt if that will be the correct approach as it will be hard to
> > generate
> > the query grammar considering we have support for phrase, operator,
> > wildcard and group queries.
> > That's why I have kept it simple and only passing the query text with
> > minimal parsing (escaping lucene special characters) to configured
> > edismax.
> > The number of fields I have mentioned above are a lot lesser than the
> > actual number of fields - around 50 in number :-). So forming such a long
> > query will both be time and resource consuming. Further, it's not going
> > to
> > fulfill my requirement anyway because I do not want to change my search
> > results, the requirement is only to provide a highlight if a field is
> > matched for all the query terms.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 12:02, Jaideep Dhok  wrote:
> >
> > > If you know all fields that need to be queried, you can rewrite it as -
> > > (assuming, f1, f2 are the fields that you have to search)
> > > (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
> > >
> > > -
> > > Jaideep
> > >
> > >
> > > On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > Hi Jaideep,
> > > >
> > > > The edismax config I have posted mentioned that the default operator
> is
> > > > AND. I am sorry if I was not clear in my previous mail, what I need
> > > really
> > > > is highlight a field when all search query terms present. The current
> > > > highlighter works for *any* of the terms match and not for *all*
> terms
> > > > match.
> > > >
> > > > Thanks,
> > > > Sandeep
> > > >
> > > >
> > > > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> > > >
> > > > > Sandeep,
> > > > > If you AND all keywords, that should be OK?
> > > > >
> > > > > Thanks
> > > > > Jaideep
> > > > >
> > > > >
> > > > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry <
> sanmes...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I have a requirement to highlight a field only when all keywords
> > > > entered
> > > > > > match. This also needs to support phrase, operator or wildcard
> > > queries.
> > > > > > I'm using Solr 4.0 with edismax because the search needs to be
> > > carried
> > > > > out
> > > > > > on multiple fields.
> > > > > > I know with highlighting feature I can configure a field to
> indicate
> > > a
> > > > > > match, however I do not find a setting to highlight only if all
> > > > keywords
> > > > > > match. That makes me think is that the right approach to take?
> Can
> > > you
> > > > > > please guide me in right direction?
> > > > > >
> > > > > > The edsimax config looks like below:
> > > > > >
> > > > > > 
> > > > > > 
> > > > > > edismax
> > > > > > explicit
> > > > > > 0.01
> > > > > > title^10 description^5 annotations^3 notes^2
> > > > > > categories
> > > > > > title
> > > > > > 0
> > > > > > *:*
> > > > > > *,score
> > > > > > 100%
> > > > > > AND
> > > > > > score desc
> > > > > > true
> > > > > > -1
> > > > > > 1
> > > > > > uniq_subtype_id
> > > > > > component_type
> > > > > > genre_type
> > > > > > 
> > > > > > 
> > > > > > collection:assets
> > > > > > 
> > > > > > 
> > > > > >
> > > > > > If I search for 'countryside number 10' as the keyword then
> highlight
> > > > > only
> > > > > > if the 'annotations' contain all these entered search terms. Any
> > > > document
> > > > > > containing just one or two terms is not a match.
> > > > > >
> > > > > > Thanks,
> > > > > > Sandeep
> > > > > > (p.s: I haven't enabled the highlighting feature yet on this
> config
> > > and
> > > > > > will be doing so only if that will fulfil the requirement I have
> > > > > mentioned
> > > > > > above.)
> > > > > >
> > > > >
> > > > > --
> > > > > _
> > > > > The information contained in this communication is intended solely
> for
> > > > the
> > > > > use of the individual or entity to whom it is addressed and others
> > > > > authorized to receive it. It may contain confidential or legally
> > > > privileged
> > > > > information. If you are not the intended recipient you are hereby
> > > > notified
> > > > > that any di

cache disable through solrJ

2013-05-20 Thread J Mohamed Zahoor

Hi

How do i disable cache (Solr FieldValueCache) for certain queries...
using HTTP it can be done using {!cache=false}... 

how can i do it from solrj?

./zahoor

Re: Highlight only when all keywords match

2013-05-20 Thread Upayavira

If you are saying that you want to change highlighting behaviour, not
query behaviour, then I suspect you are going to have to interact with
the java HighlightComponent. If you can work out how to update that
component to behave as you wish, you could either subclass it, or create
your own implementation that you can include in your Solr setup. Or, if
you make it generic enough, offer it back as a contribution that can be
included in future Solr releases.

Upayavira

On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote:
> I doubt if that will be the correct approach as it will be hard to
> generate
> the query grammar considering we have support for phrase, operator,
> wildcard and group queries.
> That's why I have kept it simple and only passing the query text with
> minimal parsing (escaping lucene special characters) to configured
> edismax.
> The number of fields I have mentioned above are a lot lesser than the
> actual number of fields - around 50 in number :-). So forming such a long
> query will both be time and resource consuming. Further, it's not going
> to
> fulfill my requirement anyway because I do not want to change my search
> results, the requirement is only to provide a highlight if a field is
> matched for all the query terms.
> 
> Thanks,
> Sandeep
> 
> 
> On 20 May 2013 12:02, Jaideep Dhok  wrote:
> 
> > If you know all fields that need to be queried, you can rewrite it as -
> > (assuming, f1, f2 are the fields that you have to search)
> > (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
> >
> > -
> > Jaideep
> >
> >
> > On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> > wrote:
> >
> > > Hi Jaideep,
> > >
> > > The edismax config I have posted mentioned that the default operator is
> > > AND. I am sorry if I was not clear in my previous mail, what I need
> > really
> > > is highlight a field when all search query terms present. The current
> > > highlighter works for *any* of the terms match and not for *all* terms
> > > match.
> > >
> > > Thanks,
> > > Sandeep
> > >
> > >
> > > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> > >
> > > > Sandeep,
> > > > If you AND all keywords, that should be OK?
> > > >
> > > > Thanks
> > > > Jaideep
> > > >
> > > >
> > > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> > > > wrote:
> > > >
> > > > > Dear All,
> > > > >
> > > > > I have a requirement to highlight a field only when all keywords
> > > entered
> > > > > match. This also needs to support phrase, operator or wildcard
> > queries.
> > > > > I'm using Solr 4.0 with edismax because the search needs to be
> > carried
> > > > out
> > > > > on multiple fields.
> > > > > I know with highlighting feature I can configure a field to indicate
> > a
> > > > > match, however I do not find a setting to highlight only if all
> > > keywords
> > > > > match. That makes me think is that the right approach to take? Can
> > you
> > > > > please guide me in right direction?
> > > > >
> > > > > The edsimax config looks like below:
> > > > >
> > > > > 
> > > > > 
> > > > > edismax
> > > > > explicit
> > > > > 0.01
> > > > > title^10 description^5 annotations^3 notes^2
> > > > > categories
> > > > > title
> > > > > 0
> > > > > *:*
> > > > > *,score
> > > > > 100%
> > > > > AND
> > > > > score desc
> > > > > true
> > > > > -1
> > > > > 1
> > > > > uniq_subtype_id
> > > > > component_type
> > > > > genre_type
> > > > > 
> > > > > 
> > > > > collection:assets
> > > > > 
> > > > > 
> > > > >
> > > > > If I search for 'countryside number 10' as the keyword then highlight
> > > > only
> > > > > if the 'annotations' contain all these entered search terms. Any
> > > document
> > > > > containing just one or two terms is not a match.
> > > > >
> > > > > Thanks,
> > > > > Sandeep
> > > > > (p.s: I haven't enabled the highlighting feature yet on this config
> > and
> > > > > will be doing so only if that will fulfil the requirement I have
> > > > mentioned
> > > > > above.)
> > > > >
> > > >
> > > > --
> > > > _
> > > > The information contained in this communication is intended solely for
> > > the
> > > > use of the individual or entity to whom it is addressed and others
> > > > authorized to receive it. It may contain confidential or legally
> > > privileged
> > > > information. If you are not the intended recipient you are hereby
> > > notified
> > > > that any disclosure, copying, distribution or taking any action in
> > > reliance
> > > > on the contents of this information is strictly prohibited and may be
> > > > unlawful. If you have received this communication in error, please
> > notify
> > > > us immediately by responding to this email and then delete it from your
> > > > system. The firm is neither liable for the proper and complete
> > > transmission
> > > > of the information contained in this communication nor for any delay in
> > > its
> > > > receipt.
> > > >
> > >
> >
> > --
> > _

After Delta Indexing, Updated indexes not getting reflected in UI

2013-05-20 Thread mechravi25

Hi ,


Im uisng solr 3.6.1 version and Im trying to implement delta indexing. Im
using the following configuration in my dataimport handler file 


 





   
   

When I hit the following URL to perform delta indexing, 


http://localhost:8988/solr/core/dataimport?command=delta-import&clean=false&commit=true

I get the following response


- 
- 
  0 
  2 
  
- 
- 
  dataimport.xml 
  
  
  delta-import 
  idle 
   
- 
  2 
  1 
  0 
  2013-05-20 03:45:43 
  2013-05-20 03:45:43 
  2013-05-20 03:45:44 
  2013-05-20 03:45:44 
  1 
  0 
  0:0:0.658 
  
  This response format is experimental. It is likely to
change in the future. 
  
  
 
 Here, It shows that there is one changed document and when I try to refresh
Solr UI to check the changes, 
 I see that no changes have been done. But, I can see that the indexed file
size has been increased.
 
 Also, I noticed one more thing i.e. the Committed tag itself is not coming
even when its passed in the query string. 
 Does Delta indexing require any other configuration changes? If so, Can you
guide me on the same?
 Or Can you tell me if I'm missing anything?


Thanks in Advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-Delta-Indexing-Updated-indexes-not-getting-reflected-in-UI-tp4064622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

I doubt if that will be the correct approach as it will be hard to generate
the query grammar considering we have support for phrase, operator,
wildcard and group queries.
That's why I have kept it simple and only passing the query text with
minimal parsing (escaping lucene special characters) to configured edismax.
The number of fields I have mentioned above are a lot lesser than the
actual number of fields - around 50 in number :-). So forming such a long
query will both be time and resource consuming. Further, it's not going to
fulfill my requirement anyway because I do not want to change my search
results, the requirement is only to provide a highlight if a field is
matched for all the query terms.

Thanks,
Sandeep


On 20 May 2013 12:02, Jaideep Dhok  wrote:

> If you know all fields that need to be queried, you can rewrite it as -
> (assuming, f1, f2 are the fields that you have to search)
> (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
>
> -
> Jaideep
>
>
> On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> wrote:
>
> > Hi Jaideep,
> >
> > The edismax config I have posted mentioned that the default operator is
> > AND. I am sorry if I was not clear in my previous mail, what I need
> really
> > is highlight a field when all search query terms present. The current
> > highlighter works for *any* of the terms match and not for *all* terms
> > match.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> >
> > > Sandeep,
> > > If you AND all keywords, that should be OK?
> > >
> > > Thanks
> > > Jaideep
> > >
> > >
> > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I have a requirement to highlight a field only when all keywords
> > entered
> > > > match. This also needs to support phrase, operator or wildcard
> queries.
> > > > I'm using Solr 4.0 with edismax because the search needs to be
> carried
> > > out
> > > > on multiple fields.
> > > > I know with highlighting feature I can configure a field to indicate
> a
> > > > match, however I do not find a setting to highlight only if all
> > keywords
> > > > match. That makes me think is that the right approach to take? Can
> you
> > > > please guide me in right direction?
> > > >
> > > > The edsimax config looks like below:
> > > >
> > > > 
> > > > 
> > > > edismax
> > > > explicit
> > > > 0.01
> > > > title^10 description^5 annotations^3 notes^2
> > > > categories
> > > > title
> > > > 0
> > > > *:*
> > > > *,score
> > > > 100%
> > > > AND
> > > > score desc
> > > > true
> > > > -1
> > > > 1
> > > > uniq_subtype_id
> > > > component_type
> > > > genre_type
> > > > 
> > > > 
> > > > collection:assets
> > > > 
> > > > 
> > > >
> > > > If I search for 'countryside number 10' as the keyword then highlight
> > > only
> > > > if the 'annotations' contain all these entered search terms. Any
> > document
> > > > containing just one or two terms is not a match.
> > > >
> > > > Thanks,
> > > > Sandeep
> > > > (p.s: I haven't enabled the highlighting feature yet on this config
> and
> > > > will be doing so only if that will fulfil the requirement I have
> > > mentioned
> > > > above.)
> > > >
> > >
> > > --
> > > _
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
> --
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

[custom data structure] aligned dynamic fields

2013-05-20 Thread Dmitry Kan

Hi all,

Our current project requirement suggests that we should start storing
custom data structures in solr index. The custom data structure would be an
equivalent of C struct.

The task is as follows.

Suppose we have two types of fields, one is FieldName1 and the other
FieldName2.

Suppose also that we can have multiple pairs of these two fields on a
document in Solr.

That is, in notation of dynamic fields:

doc1
FieldName1_id1
FieldName2_id1

FieldName1_id2
FieldName2_id2

doc2
FieldName1_id3
FieldName2_id3

FieldName1_id4
FieldName2_id4

FieldName1_id5
FieldName2_id5

etc

What we would like to have is a value for the Field1_(some_unique_id) and a
value for Field2_(some_unique_id) as input for search. That is we wouldn't
care about the some_unique_id in some search scenarios. And the search
would automatically iterate the pairs of dynamic fields and respect the
pairings.

I know it used to be so, that with dynamic fields a client must provide the
dynamically generated field names coupled with their values up front when
searching.

What data structure / solution could be used as an alternative approach to
help such a "structured search"?

Thanks,

Dmitry

Re: Highlight only when all keywords match

2013-05-20 Thread Jaideep Dhok

If you know all fields that need to be queried, you can rewrite it as -
(assuming, f1, f2 are the fields that you have to search)
(f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)

-
Jaideep


On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry  wrote:

> Hi Jaideep,
>
> The edismax config I have posted mentioned that the default operator is
> AND. I am sorry if I was not clear in my previous mail, what I need really
> is highlight a field when all search query terms present. The current
> highlighter works for *any* of the terms match and not for *all* terms
> match.
>
> Thanks,
> Sandeep
>
>
> On 20 May 2013 11:40, Jaideep Dhok  wrote:
>
> > Sandeep,
> > If you AND all keywords, that should be OK?
> >
> > Thanks
> > Jaideep
> >
> >
> > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> > wrote:
> >
> > > Dear All,
> > >
> > > I have a requirement to highlight a field only when all keywords
> entered
> > > match. This also needs to support phrase, operator or wildcard queries.
> > > I'm using Solr 4.0 with edismax because the search needs to be carried
> > out
> > > on multiple fields.
> > > I know with highlighting feature I can configure a field to indicate a
> > > match, however I do not find a setting to highlight only if all
> keywords
> > > match. That makes me think is that the right approach to take? Can you
> > > please guide me in right direction?
> > >
> > > The edsimax config looks like below:
> > >
> > > 
> > > 
> > > edismax
> > > explicit
> > > 0.01
> > > title^10 description^5 annotations^3 notes^2
> > > categories
> > > title
> > > 0
> > > *:*
> > > *,score
> > > 100%
> > > AND
> > > score desc
> > > true
> > > -1
> > > 1
> > > uniq_subtype_id
> > > component_type
> > > genre_type
> > > 
> > > 
> > > collection:assets
> > > 
> > > 
> > >
> > > If I search for 'countryside number 10' as the keyword then highlight
> > only
> > > if the 'annotations' contain all these entered search terms. Any
> document
> > > containing just one or two terms is not a match.
> > >
> > > Thanks,
> > > Sandeep
> > > (p.s: I haven't enabled the highlighting feature yet on this config and
> > > will be doing so only if that will fulfil the requirement I have
> > mentioned
> > > above.)
> > >
> >
> > --
> > _
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>

-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Hi Jaideep,

The edismax config I have posted mentioned that the default operator is
AND. I am sorry if I was not clear in my previous mail, what I need really
is highlight a field when all search query terms present. The current
highlighter works for *any* of the terms match and not for *all* terms
match.

Thanks,
Sandeep


On 20 May 2013 11:40, Jaideep Dhok  wrote:

> Sandeep,
> If you AND all keywords, that should be OK?
>
> Thanks
> Jaideep
>
>
> On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> wrote:
>
> > Dear All,
> >
> > I have a requirement to highlight a field only when all keywords entered
> > match. This also needs to support phrase, operator or wildcard queries.
> > I'm using Solr 4.0 with edismax because the search needs to be carried
> out
> > on multiple fields.
> > I know with highlighting feature I can configure a field to indicate a
> > match, however I do not find a setting to highlight only if all keywords
> > match. That makes me think is that the right approach to take? Can you
> > please guide me in right direction?
> >
> > The edsimax config looks like below:
> >
> > 
> > 
> > edismax
> > explicit
> > 0.01
> > title^10 description^5 annotations^3 notes^2
> > categories
> > title
> > 0
> > *:*
> > *,score
> > 100%
> > AND
> > score desc
> > true
> > -1
> > 1
> > uniq_subtype_id
> > component_type
> > genre_type
> > 
> > 
> > collection:assets
> > 
> > 
> >
> > If I search for 'countryside number 10' as the keyword then highlight
> only
> > if the 'annotations' contain all these entered search terms. Any document
> > containing just one or two terms is not a match.
> >
> > Thanks,
> > Sandeep
> > (p.s: I haven't enabled the highlighting feature yet on this config and
> > will be doing so only if that will fulfil the requirement I have
> mentioned
> > above.)
> >
>
> --
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: Solr cloud setup

2013-05-20 Thread Furkan KAMACI

You can start reading from here: http://wiki.apache.org/solr/SolrCloud and
here: http://docs.lucidworks.com/display/solr/SolrCloud

Furkan KAMACI
--

2013/5/20 Sagar Chaturvedi 

>
> Hi,
>
> I am new to Solr. I have a question regarding solrCloud - What is the
> difference between solr and solrcloud?
>
> Also please let me know if the complete procedure to setup solr cloud is
> mentioned somewhere.
>
> Regards,
> Sagar
>
>
>
> DISCLAIMER:
>
>
> ---
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended
>
> for the named recipient(s) only.
>
> It shall not attach any liability on the originator or NECHCL or its
>
> affiliates. Any views or opinions presented in
>
> this email are solely those of the author and may not necessarily reflect
> the
>
> opinions of NECHCL or its affiliates.
>
> Any form of reproduction, dissemination, copying, disclosure, modification,
>
> distribution and / or publication of
>
> this message without the prior written consent of the author of this
> e-mail is
>
> strictly prohibited. If you have
>
> received this email in error please delete it and notify the sender
>
> immediately. .
>
>
> ---
>

Re: Solr cloud setup

2013-05-20 Thread Gora Mohanty

On 20 May 2013 16:16, Sagar Chaturvedi
 wrote:
>
> Hi,
>
> I am new to Solr. I have a question regarding solrCloud - What is the 
> difference between solr and solrcloud?
>
> Also please let me know if the complete procedure to setup solr cloud is 
> mentioned somewhere.
[...]

Please do some basic legwork yourself: Just typing "solrcloud" into
Google search would get you started. See, for example,
http://wiki.apache.org/solr/SolrCloud

Regards,
Gora

Solr cloud setup

2013-05-20 Thread Sagar Chaturvedi


Hi,

I am new to Solr. I have a question regarding solrCloud - What is the 
difference between solr and solrcloud?

Also please let me know if the complete procedure to setup solr cloud is 
mentioned somewhere.

Regards,
Sagar



DISCLAIMER:

---

The contents of this e-mail and any attachment(s) are confidential and
intended

for the named recipient(s) only. 

It shall not attach any liability on the originator or NECHCL or its

affiliates. Any views or opinions presented in 

this email are solely those of the author and may not necessarily reflect the

opinions of NECHCL or its affiliates. 

Any form of reproduction, dissemination, copying, disclosure, modification,

distribution and / or publication of 

this message without the prior written consent of the author of this e-mail is

strictly prohibited. If you have 

received this email in error please delete it and notify the sender

immediately. .

---

Re: Highlight only when all keywords match

2013-05-20 Thread Jaideep Dhok

Sandeep,
If you AND all keywords, that should be OK?

Thanks
Jaideep


On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry  wrote:

> Dear All,
>
> I have a requirement to highlight a field only when all keywords entered
> match. This also needs to support phrase, operator or wildcard queries.
> I'm using Solr 4.0 with edismax because the search needs to be carried out
> on multiple fields.
> I know with highlighting feature I can configure a field to indicate a
> match, however I do not find a setting to highlight only if all keywords
> match. That makes me think is that the right approach to take? Can you
> please guide me in right direction?
>
> The edsimax config looks like below:
>
> 
> 
> edismax
> explicit
> 0.01
> title^10 description^5 annotations^3 notes^2
> categories
> title
> 0
> *:*
> *,score
> 100%
> AND
> score desc
> true
> -1
> 1
> uniq_subtype_id
> component_type
> genre_type
> 
> 
> collection:assets
> 
> 
>
> If I search for 'countryside number 10' as the keyword then highlight only
> if the 'annotations' contain all these entered search terms. Any document
> containing just one or two terms is not a match.
>
> Thanks,
> Sandeep
> (p.s: I haven't enabled the highlighting feature yet on this config and
> will be doing so only if that will fulfil the requirement I have mentioned
> above.)
>

-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Dear All,

I have a requirement to highlight a field only when all keywords entered
match. This also needs to support phrase, operator or wildcard queries.
I'm using Solr 4.0 with edismax because the search needs to be carried out
on multiple fields.
I know with highlighting feature I can configure a field to indicate a
match, however I do not find a setting to highlight only if all keywords
match. That makes me think is that the right approach to take? Can you
please guide me in right direction?

The edsimax config looks like below:



edismax
explicit
0.01
title^10 description^5 annotations^3 notes^2 categories
title
0
*:*
*,score
100%
AND
score desc
true
-1
1
uniq_subtype_id
component_type
genre_type


collection:assets



If I search for 'countryside number 10' as the keyword then highlight only
if the 'annotations' contain all these entered search terms. Any document
containing just one or two terms is not a match.

Thanks,
Sandeep
(p.s: I haven't enabled the highlighting feature yet on this config and
will be doing so only if that will fulfil the requirement I have mentioned
above.)

Re: How To Make Index Backup at SolrCloud?

2013-05-20 Thread Upayavira

>From looking at the ReplicationHandler code, it looks like if you hit it
with a 'details' request, it'll show you the details of the most recent
backup, including file count, status and completion time.

Upayavira

On Mon, May 20, 2013, at 08:46 AM, Furkan KAMACI wrote:
> Ooops, you didn't say it OK. It is at Timothy's answer.
> 
> 2013/5/20 Otis Gospodnetic 
> 
> > Hm, did I really say that?  What was the context?  Because I don't see
> > that in my response below
> >
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > SOLR Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> >
> >
> >
> > On Sun, May 19, 2013 at 10:16 AM, Furkan KAMACI 
> > wrote:
> > > Hi Otis;
> > >
> > > You said:
> > >
> > > "which will return a completed on date when your backup is done"
> > >
> > > which field is that?
> > >
> > > 2013/4/26 Otis Gospodnetic 
> > >
> > >> You can use the index backup command that's part of index replication,
> > >> check the Wiki.
> > >>
> > >> Otis
> > >> Solr & ElasticSearch Support
> > >> http://sematext.com/
> > >> On Apr 25, 2013 5:23 PM, "Furkan KAMACI" 
> > wrote:
> > >>
> > >> > I use SolrCloud. Let's assume that I want to move all indexes from one
> > >> > place to another. There maybe two reasons for that:
> > >> >
> > >> > First one is that: I will close all my system and I will use new
> > machines
> > >> > with previous indexes (if it is a must they may have same network
> > >> topology)
> > >> > at anywhere else after some time later.
> > >> > Second one is that: I know that SolrCloud handles failures but I will
> > >> back
> > >> > up my indexes for a disaster event.
> > >> >
> > >> > How can I back up my indexes? I know that I can start up new nodes
> > and I
> > >> > can close the old ones so I can move my indexes to other machines.
> > >> However
> > >> > how can I do such kind of backup (should I just copy data folder of
> > Solr
> > >> > nodes and put them to new Solr nodes after I change Zookeeper
> > >> > configuration)?
> > >> >
> > >> > What folks do?
> > >> >
> > >>
> >

Re: How To Make Index Backup at SolrCloud?

2013-05-20 Thread Furkan KAMACI

OK, I found it, no problem.

2013/5/20 Furkan KAMACI 

> Ooops, you didn't say it OK. It is at Timothy's answer.
>
>
> 2013/5/20 Otis Gospodnetic 
>
>> Hm, did I really say that?  What was the context?  Because I don't see
>> that in my response below
>>
>> Otis
>> --
>> Search Analytics - http://sematext.com/search-analytics/index.html
>> SOLR Performance Monitoring - http://sematext.com/spm/index.html
>>
>>
>>
>>
>>
>> On Sun, May 19, 2013 at 10:16 AM, Furkan KAMACI 
>> wrote:
>> > Hi Otis;
>> >
>> > You said:
>> >
>> > "which will return a completed on date when your backup is done"
>> >
>> > which field is that?
>> >
>> > 2013/4/26 Otis Gospodnetic 
>> >
>> >> You can use the index backup command that's part of index replication,
>> >> check the Wiki.
>> >>
>> >> Otis
>> >> Solr & ElasticSearch Support
>> >> http://sematext.com/
>> >> On Apr 25, 2013 5:23 PM, "Furkan KAMACI" 
>> wrote:
>> >>
>> >> > I use SolrCloud. Let's assume that I want to move all indexes from
>> one
>> >> > place to another. There maybe two reasons for that:
>> >> >
>> >> > First one is that: I will close all my system and I will use new
>> machines
>> >> > with previous indexes (if it is a must they may have same network
>> >> topology)
>> >> > at anywhere else after some time later.
>> >> > Second one is that: I know that SolrCloud handles failures but I will
>> >> back
>> >> > up my indexes for a disaster event.
>> >> >
>> >> > How can I back up my indexes? I know that I can start up new nodes
>> and I
>> >> > can close the old ones so I can move my indexes to other machines.
>> >> However
>> >> > how can I do such kind of backup (should I just copy data folder of
>> Solr
>> >> > nodes and put them to new Solr nodes after I change Zookeeper
>> >> > configuration)?
>> >> >
>> >> > What folks do?
>> >> >
>> >>
>>
>
>

Re: How To Make Index Backup at SolrCloud?

2013-05-20 Thread Furkan KAMACI

Ooops, you didn't say it OK. It is at Timothy's answer.

2013/5/20 Otis Gospodnetic 

> Hm, did I really say that?  What was the context?  Because I don't see
> that in my response below
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> SOLR Performance Monitoring - http://sematext.com/spm/index.html
>
>
>
>
>
> On Sun, May 19, 2013 at 10:16 AM, Furkan KAMACI 
> wrote:
> > Hi Otis;
> >
> > You said:
> >
> > "which will return a completed on date when your backup is done"
> >
> > which field is that?
> >
> > 2013/4/26 Otis Gospodnetic 
> >
> >> You can use the index backup command that's part of index replication,
> >> check the Wiki.
> >>
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >> On Apr 25, 2013 5:23 PM, "Furkan KAMACI" 
> wrote:
> >>
> >> > I use SolrCloud. Let's assume that I want to move all indexes from one
> >> > place to another. There maybe two reasons for that:
> >> >
> >> > First one is that: I will close all my system and I will use new
> machines
> >> > with previous indexes (if it is a must they may have same network
> >> topology)
> >> > at anywhere else after some time later.
> >> > Second one is that: I know that SolrCloud handles failures but I will
> >> back
> >> > up my indexes for a disaster event.
> >> >
> >> > How can I back up my indexes? I know that I can start up new nodes
> and I
> >> > can close the old ones so I can move my indexes to other machines.
> >> However
> >> > how can I do such kind of backup (should I just copy data folder of
> Solr
> >> > nodes and put them to new Solr nodes after I change Zookeeper
> >> > configuration)?
> >> >
> >> > What folks do?
> >> >
> >>
>

Re: Adding filed in Schema.xml

2013-05-20 Thread Raymond Wiker

On May 20, 2013, at 05:05 , Kamal Palei  wrote:
> I have put the code to add these fields in document object and index it.
> I have not deleted whole indexed data and reindex it. But I expect whatever
> new documents are added, for those documents these two fields salary and
> experience should be reindexed. Eventually I have to delete the index and
> re-index it, but will do after  all these things work.
> 
> Now question is, what I need to do so that these fields are shown as index
> fields.

Ar you using the "fl" query parameter? If so, you may need to add the new 
fields to the list of fields that you want solr to return...

87 matches

Mail list logo