Re: Concat 2 fields in another field

2013-08-27 Thread Alok Bhandari
Hi all,

thanks for your replies. I have managed to do this by writing custom
updateprocessor and configured it as bellow 


 firstName
 lastName
  fullName
 _
.

Federico Chiacchiaretta , I have tried the option mentioned by you but on
frequent update of the document it keeps on adding the value multiple times
which I don't want . In my custom component I checked for existing value and
if its empty then I have updated it by fN_lN. 

Thanks a lot for quick replies.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086934.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding weight to location of the string found

2013-08-27 Thread zseml
In Solr syntax, is there a way to add weight to the result found based on the
location of the string that it's found?

For instance, if I'm searching these strings for "Hello":

"Hello World"
"World Hello"

...I'd like the first result to be the first one in my search results.

Additionally, is there a way to add weight based on the number of
occurrences of a string that are found?  For instance, if I'm searching
these strings for "Hello":

"Hello World Hello"
"Hello World"

...again, I'd like the first result to be found.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-weight-to-location-of-the-string-found-tp4086932.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filter cache pollution during sharded edismax queries

2013-08-27 Thread Otis Gospodnetic
Hi Ken,

JIRA is kind of stuffed.  I'd imagine showing more proof on the ML may
be more effective.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Aug 27, 2013 at 4:32 AM, Ken Krugler
 wrote:
> Hi Otis,
>
> Sorry I missed your reply, and thanks for trying to find a similar report.
>
> Wondering if I should file a Jira issue? That might get more attention :)
>
> -- Ken
>
> On Jul 5, 2013, at 1:05pm, Otis Gospodnetic wrote:
>
>> Hi Ken,
>>
>> Uh, I left this email until now hoping I could find you a reference to
>> similar reports, but I can't find them now.  I am quite sure I saw
>> somebody with a similar report within the last month.  Plus, several
>> people have reported issues with performance dropping when they went
>> from 3.x to 4.x and maybe this is why.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Tue, Jul 2, 2013 at 3:01 PM, Ken Krugler  
>> wrote:
>>> Hi all,
>>>
>>> After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit ratio 
>>> had dropped significantly.
>>>
>>> Previously it was at 95+%, but now it's < 50%.
>>>
>>> I enabled recording 100 entries for debugging, and in looking at them it 
>>> seems that edismax (and faceting) is creating entries for me.
>>>
>>> This is in a sharded setup, so it's a distributed search.
>>>
>>> If I do a search for the string "bogus text" using edismax on two fields, I 
>>> get an entry in each of the shard's filter caches that looks like:
>>>
>>> item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2):
>>>
>>> Is this expected?
>>>
>>> I have a similar situation happening during faceted search, even though my 
>>> fields are single-value/untokenized strings, and I'm not using the enum 
>>> facet method.
>>>
>>> But I'll get many, many entries in the filterCache for facet values, and 
>>> they all look like "item_::"
>>>
>>> The net result of the above is that even with a very big filterCache size 
>>> of 2K, the hit ratio is still only 60%.
>>>
>>> Thanks for any insights,
>>>
>>> -- Ken
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>


Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread vsilgalis
 

That doesn't seem to be a problem.

Markus, are you saying that I should plan on resident memory being at least
double my heap size?  I haven't run into issues around this before but then
again I don't know everything.

Is this a rule of thumb or is their documentation I can look at.

Thanks again.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-2-1-High-Resident-Memory-Usage-tp4086866p4086923.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-user@lucene.apache.org

2013-08-27 Thread Utkarsh Sengar
> Use a different tokenizer, possibly one of the regex ones.
> fake it with phrase queries.
> Take a really good look at the various filter combinations. It's
   possible that WhitespaceTokenizer and WordDelimiterFilterFactory
   might be able to do good things.
Will try to play with these two options.

> Clearly define whether this is capability that you really need.
Yes, this is a needed feature. Some of our queries are at&t, h&m, m&m.
Returning an empty response is not one of the best experience.

I also tried:

  


With: wdfftypes.txt:
& => ALPHA
\u0026 => ALPHA
$ => DIGIT
% => DIGIT
. => DIGIT
\u002C => DIGIT


But it didn't work.

Thanks,
-Utkarsh




On Tue, Aug 27, 2013 at 3:07 PM, Erick Erickson wrote:

> bq: Is there a way I can make "m&m" index as one string AND also keep
> StandardTokenizerFactory since I need it for other searches.
>
> In a word, no. You get one and only one tokenizer per field. But there
> are lots of options:
> > Use a different tokenizer, possibly one of the regex ones.
> > fake it with phrase queries.
> > Take a really good look at the various filter combinations. It's
>possible that WhitespaceTokenizer and WordDelimiterFilterFactory
>might be able to do good things.
> > Clearly define whether this is capability that you really need.
>
> This last is my recurring plea to insure that the effort is of real benefit
> to the user and not just something someone noticed that's actually
> only useful 0.001% of the time.
>
> Best
> Erick
>
>
> On Tue, Aug 27, 2013 at 5:00 PM, Utkarsh Sengar  >wrote:
>
> > Yup, the query "o'reilly" worked after adding WDF to the index analyser.
> >
> >
> > Although "m&m" or "m\&m" doesn't work.
> > Field analysis for "m&m" says:
> > ST m, m
> > WDF m, m
> >
> > ST m, m
> > WDF m, m
> >
> > So essentially & is ignored during the index or the query. My guess is,
> the
> > standard tokenize is the problem. As the documentation says:
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
> > Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5",
> > ALPHANUM:"can't"
> >
> > The char "&" will be ignored I guess.
> >
> > *So, my question is:*
> > Is there a way I can make "m&m" index as one string AND also keep
> > StandardTokenizerFactory since I need it for other searches.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar  > >wrote:
> >
> > > Thanks for the info.
> > >
> > > 1.
> > >
> >
> http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn
> > :
> > >
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":16,
> > > "params":{
> > >   "debugQuery":"true",
> > >   "indent":"true",
> > >   "q":"o'reilly",
> > >   "wt":"json"}},
> > >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> > >   },
> > >   "debug":{
> > > "rawquerystring":"o'reilly",
> > > "querystring":"o'reilly",
> > > "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly
> > oreilly)\")",
> > > "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"",
> > > "QParser":"LuceneQParser",
> > > "explain":{}
> > >}
> > > }
> > >
> > >
> > >
> > > 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this
> > > means tokens are same for "o'reilly"
> > > 3. I tried escaping ', it doesn’t help:
> > > http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true<
> > http://SERVER/solr/prodinfo/select?q=o%5C%27reilly&wt=json&indent=true>
> > >
> > > I will add WordDelimiterFilterFactory for index and see if it fixes the
> > > problem.
> > >
> > > Thanks,
> > > -Utkarsh
> > >
> > >
> > >
> > > On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> > >
> > >> First thing to do is attach &query=debug to your queries and look at
> the
> > >> parsed output.
> > >>
> > >> Second thing to do is look at the admin/analysis page and see what
> > happens
> > >> at index and query time to things like o'reilly. You have
> > >> WordDelimiterFilterFactory
> > >> configured in your query but not index analysis chain. My bet on that
> is
> > >> that
> > >> you're getting different tokens at query and index time...
> > >>
> > >> Third thing is that you need to escape the & character. It's probably
> > >> being
> > >> interpreted as a delimiter on the URL and Solr ignores params it
> doesn't
> > >> understand.
> > >>
> > >> Best
> > >> Erick
> > >>
> > >>
> > >> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar <
> utkarsh2...@gmail.com
> > >> >wrote:
> > >>
> > >> > Some of the queries (not all) with special chars return no
> documents.
> > >> >
> > >> > Example: queries returning no documents
> > >> > q=m&m (this can be explained, when I search for "m m", no documents
> > are
> > >> > returned)
> > >> > q=o'reilly (when I search for "o reilly", I get documents back)
> > >> >
> > >> >
> > >> > Queries returning documents:

Re: Can a data import handler grab all pages of an RSS feed?

2013-08-27 Thread Alexandre Rafalovitch
Have you tried using $hasMore and $nextUrl? You can inject it with a custom
transformer. It is not documented very well, but is mentioned on the Wiki.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Aug 26, 2013 at 9:49 PM, eShard  wrote:

> Good morning,
> I have an IBM Portal atom feed that spans multiple pages.
> Is there a way to instruct the DIH to grab all available pages?
> I can put a huge range in but that can be extremely slow with large amounts
> of XML data.
> I'm currently using Solr 4.0 final.
>
> Thanks,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: ICUTokenizer class not found with Solr 4.4

2013-08-27 Thread Shawn Heisey

On 8/27/2013 5:11 PM, Naomi Dushay wrote:

Perhaps you are missing the following from your solrconfig




I ran into this issue (I'm the one that filed SOLR-4852) and I am not 
using blacklight.  I am only using what can be found in a Solr download, 
plus the MySQL JDBC driver for dataimport.


I prefer not to load jars via solrconfig.xml.  I have a lot of cores and 
every core needs to use the same jars.  Rather than have the same jars 
loaded 18 times (once by each of the 18 solrconfig.xml files), I would 
rather have Solr load them once and make the libraries available to all 
cores.  Using ${solr.solr.home}/lib accomplishes this goal.


Thanks,
Shawn



Re: ICUTokenizer class not found with Solr 4.4

2013-08-27 Thread Naomi Dushay
Hi Tom,

Sorry - I was meeting with the East-Asia librarians …

Perhaps you are missing the following from your solrconfig



(this is the top of my solrconfig.xml:



  

  
${solr.abortOnConfigurationError:true}

  4.4

  

  /data/solr/cjk-icu

  
  
  

  
…


and here is my solr.xml, if it matters:

note the "sharedLib" value



  

  






On Aug 27, 2013, at 3:29 PM, Tom Burton-West wrote:

> Hello all,
> 
> According to the README.txt in solr-4.4.0/solr/example/solr/collection1, all 
> we have to do is create a collection1/lib directory and put whatever jars we 
> want in there. 
> 
> ".. /lib.   
>If it exists, Solr will load any Jars
>found in this directory and use them to resolve any "plugins"
> specified in your solrconfig.xml or schema.xml "
> 
> 
>   I did so  (see below).  However, I keep getting a class not found error 
> (see below).
> 
> Has the default changed from what is documented in the README.txt file?
> Is there something I have to change in solrconfig.xml or solr.xml to make 
> this work?
> 
> I looked at SOLR-4852, but don't understand.   It sounds like maybe there is 
> a problem if the collection1/lib directory is also specified in 
> solrconfig.xml.  But I didn't do that. (i.e. out of the box solrconfig.xml)
>  Does this mean that by following what it says in the README.txt, I am making 
> some kind of a configuration error.  I also don't understand the workaround 
> in SOLR-4852.
> 
> Is this an ICU issue?  A java 7 issue?  a Solr 4.4 issue,  or did I simply 
> not understand the README.txt?
> 
> 
> 
> Tom
> 
> --
> 
> 
> org.apache.solr.common.SolrException; null:java.lang.NoClassDefFoundError: 
> org/apache/lucene/analysis/icu/segmentation/ICUTokenizer
> 
>  ls collection1/lib
> icu4j-49.1.jar  
> lucene-analyzers-icu-4.4-SNAPSHOT.jar  
> solr-analysis-extras-4.4-SNAPSHOT.jar
> 
> https://issues.apache.org/jira/browse/SOLR-4852
> 
> Collection1/README.txt excerpt:
> 
>  lib/
> This directory is optional.  If it exists, Solr will load any Jars
> found in this directory and use them to resolve any "plugins"
> specified in your solrconfig.xml or schema.xml (ie: Analyzers,
> Request Handlers, etc...).  Alternatively you can use the 
> syntax in conf/solrconfig.xml to direct Solr to your plugins.  See 
> the example conf/solrconfig.xml file for details.
> 



RE: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Markus Jelsma
Hi


-Original message-
> From:Shawn Heisey 
> Sent: Wednesday 28th August 2013 0:50
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR 4.2.1 - High Resident Memory Usage
> 
> On 8/27/2013 4:17 PM, Erick Erickson wrote:
> > Ok, this whole topic usually gives me heartburn. So I'll just point out
> > an interesting blog on this from Mike McCandless:
> > http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html
> >
> > At least tuning swappiness to 0 will tell you whether it's real or phantom.
> > Of course I'd be trying it on a test machine, not prod!
> 
> Due to horror stories about operating systems behaving strangely when 
> swap is nonexistent, I don't like to completely disable swap.  I know 
> that if it ever actually starts needing swap that performance will be 
> terrible, but if there is an actual out of memory event and there's no 
> swap at all, then it will start killing off processes, and it might make 
> the wrong choice.

Why? First of all, the OOMKiller in Linux is almost always accurate, it will 
kill your servlet container - although once it also killed my syslogd but that 
was a very experimental set up. If your OS really runs OOM your deamons and/or 
heap allocation is incorrect, or your have too little RAM anyway.

You need to account for additional resident memory plus the heap size your 
allocated, so it's not exact science. But if you know that a 512MB heap never 
results in more than 1024MB RES than you will never run OOM on your OS, and 
therefore never need the swap.

If i can choose i prefer a smaller heap size (living on the edge) and have it 
killed by an OOM script and restart it than rely on the OS' OOMkiller or swap. 
In any case, make sure your servlet container's RES plus the rest of the 
daemons never go beyond your RAM capacity.

Once it swaps, it usually keeps swapping occasionally, or worse :) So keep 
everything in RES, dealing with GC is bad already!


> 
> I do set swappiness to 0 or 1 on all my machines.  I would rather have 
> less commonly used (and very small, memory-wise) processes on special 
> purpose servers (sshd, postfix, etc) remain resident and respond quickly.
> 
> Thanks,
> Shawn
> 
> 


Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Shawn Heisey

On 8/27/2013 4:48 PM, vsilgalis wrote:

dash:


JVM section:


ps output:



Erick that may be one of the ways I approach this, I just want to make sure
my setup is ideal and I didn't miss anything.  As of right now performance
doesn't seem to be affected *knocks on wood*, but I need peace of mind that
I won't have to babysit these boxes over the comings months and as load may
continue to grow that I won't run into issues in the future.


Your server is behaving very strangely.  Java is reporting more memory 
in use than it should, but the shared memory value isn't large like it 
is in my case.  Even if the 22GB figure is completely accurate and not 
misleading like my number is, it should still leave 2GB of memory free, 
but clearly something on the machine is wanting more memory.


I've got something kinda weird to check that I have only just recently 
learned about.  What do you get from the following command?


cat /proc/meminfo | grep -i huge

Thanks,
Shawn



Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Shawn Heisey

On 8/27/2013 4:17 PM, Erick Erickson wrote:

Ok, this whole topic usually gives me heartburn. So I'll just point out
an interesting blog on this from Mike McCandless:
http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html

At least tuning swappiness to 0 will tell you whether it's real or phantom.
Of course I'd be trying it on a test machine, not prod!


Due to horror stories about operating systems behaving strangely when 
swap is nonexistent, I don't like to completely disable swap.  I know 
that if it ever actually starts needing swap that performance will be 
terrible, but if there is an actual out of memory event and there's no 
swap at all, then it will start killing off processes, and it might make 
the wrong choice.


I do set swappiness to 0 or 1 on all my machines.  I would rather have 
less commonly used (and very small, memory-wise) processes on special 
purpose servers (sshd, postfix, etc) remain resident and respond quickly.


Thanks,
Shawn



Re: ICUTokenizer class not found with Solr 4.4

2013-08-27 Thread Shawn Heisey

On 8/27/2013 4:29 PM, Tom Burton-West wrote:

According to the README.txt in solr-4.4.0/solr/example/solr/collection1,
all we have to do is create a collection1/lib directory and put whatever
jars we want in there.

".. /lib.
If it exists, Solr will load any Jars
found in this directory and use them to resolve any "plugins"
 specified in your solrconfig.xml or schema.xml "


   I did so  (see below).  However, I keep getting a class not found error
(see below).

Has the default changed from what is documented in the README.txt file?
Is there something I have to change in solrconfig.xml or solr.xml to make
this work?

I looked at SOLR-4852, but don't understand.   It sounds like maybe there
is a problem if the collection1/lib directory is also specified in
solrconfig.xml.  But I didn't do that. (i.e. out of the box solrconfig.xml)
  Does this mean that by following what it says in the README.txt, I am
making some kind of a configuration error.  I also don't understand the
workaround in SOLR-4852.


That's my bug! :)  If you have sharedLib set to "lib" (or explicitly the 
lib directory under solr.solr.home) in solr.xml, then ICUTokenizer 
cannot be found despite the fact that all the correct jars are there.


The workaround is to remove sharedLib from solr.xml, or set it to some 
other directory that either doesn't exist or has no jars in it.  The 
${solr.solr.home}/lib directory is automatically added to the classpath 
regardless of config, there seems to be some kind of classloading bug 
when the sharedLib adds the same directory again.  This all worked fine 
in 3.x, and early 4.x releases, but due to classloader changes, it seems 
to have broken.  I think (based on the issue description) that it 
started being a problem with 4.3-SNAPSHOT.


The same thing happens if you set sharedLib to "foo" and put some of 
your jars in lib and some in foo.  It's quite mystifying.


Thanks,
Shawn



Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread vsilgalis
dash:
 

JVM section:
 

ps output:
 


Erick that may be one of the ways I approach this, I just want to make sure
my setup is ideal and I didn't miss anything.  As of right now performance
doesn't seem to be affected *knocks on wood*, but I need peace of mind that
I won't have to babysit these boxes over the comings months and as load may
continue to grow that I won't run into issues in the future.

Thanks for all the help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-2-1-High-Resident-Memory-Usage-tp4086866p4086902.html
Sent from the Solr - User mailing list archive at Nabble.com.


ICUTokenizer class not found with Solr 4.4

2013-08-27 Thread Tom Burton-West
Hello all,

According to the README.txt in solr-4.4.0/solr/example/solr/collection1,
all we have to do is create a collection1/lib directory and put whatever
jars we want in there.

".. /lib.
   If it exists, Solr will load any Jars
   found in this directory and use them to resolve any "plugins"
specified in your solrconfig.xml or schema.xml "


  I did so  (see below).  However, I keep getting a class not found error
(see below).

Has the default changed from what is documented in the README.txt file?
Is there something I have to change in solrconfig.xml or solr.xml to make
this work?

I looked at SOLR-4852, but don't understand.   It sounds like maybe there
is a problem if the collection1/lib directory is also specified in
solrconfig.xml.  But I didn't do that. (i.e. out of the box solrconfig.xml)
 Does this mean that by following what it says in the README.txt, I am
making some kind of a configuration error.  I also don't understand the
workaround in SOLR-4852.

Is this an ICU issue?  A java 7 issue?  a Solr 4.4 issue,  or did I simply
not understand the README.txt?



Tom

--


org.apache.solr.common.SolrException; null:java.lang.NoClassDefFoundError:
org/apache/lucene/analysis/icu/segmentation/ICUTokenizer

 ls collection1/lib
icu4j-49.1.jar
lucene-analyzers-icu-4.4-SNAPSHOT.jar
solr-analysis-extras-4.4-SNAPSHOT.jar

https://issues.apache.org/jira/browse/SOLR-4852

Collection1/README.txt excerpt:

 lib/
This directory is optional.  If it exists, Solr will load any Jars
found in this directory and use them to resolve any "plugins"
specified in your solrconfig.xml or schema.xml (ie: Analyzers,
Request Handlers, etc...).  Alternatively you can use the 
syntax in conf/solrconfig.xml to direct Solr to your plugins.  See
the example conf/solrconfig.xml file for details.


Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Erick Erickson
Ok, this whole topic usually gives me heartburn. So I'll just point out
an interesting blog on this from Mike McCandless:
http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html

At least tuning swappiness to 0 will tell you whether it's real or phantom.
Of course I'd be trying it on a test machine, not prod!

FWIW,
Erick


On Tue, Aug 27, 2013 at 5:50 PM, Shawn Heisey  wrote:

> On 8/27/2013 3:32 PM, vsilgalis wrote:
>
>> thanks for the quick reply.
>>
>> I made to rule out what I could around how Linux is handling this stuff.
>> Yes I'm using the default swappiness setting of 60, but at this point it
>> looks like the machine is swapping now because of low memory.
>>
>> Here is the vmstat and free -m results:
>> > vmstat_free_output.png
>> >
>>
>>
>> Here is my top sorted by mem:
>> 
>> >
>>
>> Again, I feel like I might be missing something but not sure what.
>>
>
> You are right, it is definitely swapping, and it's more than a little bit.
>  I don't see any indication on the top output that anything other than Java
> is using a lot of memory, so I don't know what's using the swap.  Just to
> be sure we have the right info, can you give me the output of the following
> command?
>
> ps aux --sort -rss | cut -c1-80 | head -n 10
>
> This looks like your Java (or maybe something else on the box) may be
> misbehaving very badly.  Can you go to your Solr UI dashboard and get me a
> screenshot?  I need all the info from the JVM-Memory and JVM sections,
> including the whole JVM section.  If you have a lot of jvm args, you may
> need to scroll down to see them all at once.
>
> Thanks,
> Shawn
>
>


solr-user@lucene.apache.org

2013-08-27 Thread Erick Erickson
bq: Is there a way I can make "m&m" index as one string AND also keep
StandardTokenizerFactory since I need it for other searches.

In a word, no. You get one and only one tokenizer per field. But there
are lots of options:
> Use a different tokenizer, possibly one of the regex ones.
> fake it with phrase queries.
> Take a really good look at the various filter combinations. It's
   possible that WhitespaceTokenizer and WordDelimiterFilterFactory
   might be able to do good things.
> Clearly define whether this is capability that you really need.

This last is my recurring plea to insure that the effort is of real benefit
to the user and not just something someone noticed that's actually
only useful 0.001% of the time.

Best
Erick


On Tue, Aug 27, 2013 at 5:00 PM, Utkarsh Sengar wrote:

> Yup, the query "o'reilly" worked after adding WDF to the index analyser.
>
>
> Although "m&m" or "m\&m" doesn't work.
> Field analysis for "m&m" says:
> ST m, m
> WDF m, m
>
> ST m, m
> WDF m, m
>
> So essentially & is ignored during the index or the query. My guess is, the
> standard tokenize is the problem. As the documentation says:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
> Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5",
> ALPHANUM:"can't"
>
> The char "&" will be ignored I guess.
>
> *So, my question is:*
> Is there a way I can make "m&m" index as one string AND also keep
> StandardTokenizerFactory since I need it for other searches.
>
> Thanks,
> -Utkarsh
>
>
> On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar  >wrote:
>
> > Thanks for the info.
> >
> > 1.
> >
> http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn
> :
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":16,
> > "params":{
> >   "debugQuery":"true",
> >   "indent":"true",
> >   "q":"o'reilly",
> >   "wt":"json"}},
> >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> >   },
> >   "debug":{
> > "rawquerystring":"o'reilly",
> > "querystring":"o'reilly",
> > "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly
> oreilly)\")",
> > "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"",
> > "QParser":"LuceneQParser",
> > "explain":{}
> >}
> > }
> >
> >
> >
> > 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this
> > means tokens are same for "o'reilly"
> > 3. I tried escaping ', it doesn’t help:
> > http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true<
> http://SERVER/solr/prodinfo/select?q=o%5C%27reilly&wt=json&indent=true>
> >
> > I will add WordDelimiterFilterFactory for index and see if it fixes the
> > problem.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> >
> > On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson  >wrote:
> >
> >> First thing to do is attach &query=debug to your queries and look at the
> >> parsed output.
> >>
> >> Second thing to do is look at the admin/analysis page and see what
> happens
> >> at index and query time to things like o'reilly. You have
> >> WordDelimiterFilterFactory
> >> configured in your query but not index analysis chain. My bet on that is
> >> that
> >> you're getting different tokens at query and index time...
> >>
> >> Third thing is that you need to escape the & character. It's probably
> >> being
> >> interpreted as a delimiter on the URL and Solr ignores params it doesn't
> >> understand.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar  >> >wrote:
> >>
> >> > Some of the queries (not all) with special chars return no documents.
> >> >
> >> > Example: queries returning no documents
> >> > q=m&m (this can be explained, when I search for "m m", no documents
> are
> >> > returned)
> >> > q=o'reilly (when I search for "o reilly", I get documents back)
> >> >
> >> >
> >> > Queries returning documents:
> >> > q=hello&world (document matched is "Hello World: A Life in Ham Radio")
> >> >
> >> >
> >> > My questions are:
> >> > 1. What's wrong with "o'reilly"? What changes do I need in my field
> >> type?
> >> > 2. How can I make the query "m&m" work?
> >> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate
> Candy
> >> > Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1
> >> Bag
> >> > (42 oz)"
> >> >
> >> >
> >> > FIeld type:
> >> >  >> > positionIncrementGap="100">
> >> >  
> >> >   
> >> >>> ignoreCase="true"
> >> > words="stopwords.txt" enablePositionIncrements="true" />
> >> >   
> >> >class="solr.EnglishMinimalStemFilterFactory"/>
> >> >   
> >> >>> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> > 
> >> > 
> >> >>> > generateWordParts="1" generateNumberParts="1"
> >> >
> >> > catenateWords="1"
> >> >
> >> > catenateNumbers="1"
> >> >
> >> 

Re: Solr 4.2 Regular expression, returning only matched substring

2013-08-27 Thread Erick Erickson
You can facet by arbitrary query, does that work? See facet.query...


Best
Erick


On Tue, Aug 27, 2013 at 2:31 PM, Jai  wrote:

> Hi,
>
> is it possible to get only the matched substring of a text/string type
> field in response.
> i am trying to search with regular expression and do facet on different
> strings (substring of the field) that matches this regular expression.
>
> For example if i write a regular expression to match email, is there any
> way to return only the matched email from the indexed sentence, so that i
> can do facet on it.
>
> will really appreciate any help.
>
> thanks and regards
> jai
>


Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Shawn Heisey

On 8/27/2013 3:32 PM, vsilgalis wrote:

thanks for the quick reply.

I made to rule out what I could around how Linux is handling this stuff.
Yes I'm using the default swappiness setting of 60, but at this point it
looks like the machine is swapping now because of low memory.

Here is the vmstat and free -m results:



Here is my top sorted by mem:


Again, I feel like I might be missing something but not sure what.


You are right, it is definitely swapping, and it's more than a little 
bit.  I don't see any indication on the top output that anything other 
than Java is using a lot of memory, so I don't know what's using the 
swap.  Just to be sure we have the right info, can you give me the 
output of the following command?


ps aux --sort -rss | cut -c1-80 | head -n 10

This looks like your Java (or maybe something else on the box) may be 
misbehaving very badly.  Can you go to your Solr UI dashboard and get me 
a screenshot?  I need all the info from the JVM-Memory and JVM sections, 
including the whole JVM section.  If you have a lot of jvm args, you may 
need to scroll down to see them all at once.


Thanks,
Shawn



Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread vsilgalis
thanks for the quick reply.

I made to rule out what I could around how Linux is handling this stuff. 
Yes I'm using the default swappiness setting of 60, but at this point it
looks like the machine is swapping now because of low memory.

Here is the vmstat and free -m results:
 


Here is my top sorted by mem:
 

Again, I feel like I might be missing something but not sure what.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-2-1-High-Resident-Memory-Usage-tp4086866p4086882.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-user@lucene.apache.org

2013-08-27 Thread Utkarsh Sengar
Yup, the query "o'reilly" worked after adding WDF to the index analyser.


Although "m&m" or "m\&m" doesn't work.
Field analysis for "m&m" says:
ST m, m
WDF m, m

ST m, m
WDF m, m

So essentially & is ignored during the index or the query. My guess is, the
standard tokenize is the problem. As the documentation says:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5",
ALPHANUM:"can't"

The char "&" will be ignored I guess.

*So, my question is:*
Is there a way I can make "m&m" index as one string AND also keep
StandardTokenizerFactory since I need it for other searches.

Thanks,
-Utkarsh


On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar wrote:

> Thanks for the info.
>
> 1.
> http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":16,
> "params":{
>   "debugQuery":"true",
>   "indent":"true",
>   "q":"o'reilly",
>   "wt":"json"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   },
>   "debug":{
> "rawquerystring":"o'reilly",
> "querystring":"o'reilly",
> "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly oreilly)\")",
> "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"",
> "QParser":"LuceneQParser",
> "explain":{}
>}
> }
>
>
>
> 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this
> means tokens are same for "o'reilly"
> 3. I tried escaping ', it doesn’t help:
> http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true
>
> I will add WordDelimiterFilterFactory for index and see if it fixes the
> problem.
>
> Thanks,
> -Utkarsh
>
>
>
> On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson 
> wrote:
>
>> First thing to do is attach &query=debug to your queries and look at the
>> parsed output.
>>
>> Second thing to do is look at the admin/analysis page and see what happens
>> at index and query time to things like o'reilly. You have
>> WordDelimiterFilterFactory
>> configured in your query but not index analysis chain. My bet on that is
>> that
>> you're getting different tokens at query and index time...
>>
>> Third thing is that you need to escape the & character. It's probably
>> being
>> interpreted as a delimiter on the URL and Solr ignores params it doesn't
>> understand.
>>
>> Best
>> Erick
>>
>>
>> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar > >wrote:
>>
>> > Some of the queries (not all) with special chars return no documents.
>> >
>> > Example: queries returning no documents
>> > q=m&m (this can be explained, when I search for "m m", no documents are
>> > returned)
>> > q=o'reilly (when I search for "o reilly", I get documents back)
>> >
>> >
>> > Queries returning documents:
>> > q=hello&world (document matched is "Hello World: A Life in Ham Radio")
>> >
>> >
>> > My questions are:
>> > 1. What's wrong with "o'reilly"? What changes do I need in my field
>> type?
>> > 2. How can I make the query "m&m" work?
>> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy
>> > Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1
>> Bag
>> > (42 oz)"
>> >
>> >
>> > FIeld type:
>> > > > positionIncrementGap="100">
>> >  
>> >   
>> >   > ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true" />
>> >   
>> >   
>> >   
>> >   > class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> > 
>> > 
>> >   > > generateWordParts="1" generateNumberParts="1"
>> >
>> > catenateWords="1"
>> >
>> > catenateNumbers="1"
>> >
>> > catenateAll="0"
>> >
>> > preserveOriginal="1"/>
>> >   
>> >   > ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true" />
>> >   
>> >   
>> >   
>> >   > class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> > 
>> > 
>> >
>> >
>> > --
>> > Thanks,
>> > -Utkarsh
>> >
>>
>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Jack Krupansky

And here I was just about to give Mark credit for updating the wiki!

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Tuesday, August 27, 2013 4:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Transaction log "on-disk" guarantees

Well, when you originally googled it wasn't there , I just put it in
after
reading your post and realizing that it wasn't documented.

Erick


On Tue, Aug 27, 2013 at 2:13 PM, SandroZbinden  wrote:


Hey Jack

Thanks a lot. I just googled for fsync and syncLevel instead of searching
in
the solr wiki. Won't happen again.

Here is the link to the solr wiki page that describes to set the syncLevel

http://wiki.apache.org/solr/SolrCloud?highlight=%28fsync%29



--
View this message in context:
http://lucene.472066.n3.nabble.com/Transaction-log-on-disk-guarantees-tp4086829p4086867.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Erick Erickson
Well, when you originally googled it wasn't there , I just put it in
after
reading your post and realizing that it wasn't documented.

Erick


On Tue, Aug 27, 2013 at 2:13 PM, SandroZbinden  wrote:

> Hey Jack
>
> Thanks a lot. I just googled for fsync and syncLevel instead of searching
> in
> the solr wiki. Won't happen again.
>
> Here is the link to the solr wiki page that describes to set the syncLevel
>
> http://wiki.apache.org/solr/SolrCloud?highlight=%28fsync%29
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Transaction-log-on-disk-guarantees-tp4086829p4086867.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Shawn Heisey

On 8/27/2013 11:56 AM, vsilgalis wrote:

We have a 2 shard SOLRCloud implementation with 6 servers in production.  We
have allocated 24GB to each server and are using JVM max memory settings of
-Xmx14336 on each of the servers.  We are using the same embedded jetty that
SOLR comes with.  The JVM side of things looks like what I'd expect from
java, climbs until ~13GB and then GC occurs and we are back down to ~4GB
used.

However, on the original leaders of the cluster we currently are seeing
~22GB resident memory usage for Jetty/SOLR and the machines have begun using
swap, which is something I am concerned with. So the question is should I
expect jetty/SOLR to cleanup after itself in terms of memory usage (outside
of the JVM)? Is there something I'm missing to make things more efficient?
Anything else I should be looking at?


I see something interesting in my own production setup.  Here are a 
couple of screenshots:


https://www.dropbox.com/s/zacp4n3gu8wb9ab/idxb1-top-sorted-mem.png
https://www.dropbox.com/s/70e12m9iunqunzz/idxb1-solr-dashboard.png

The first is a 'top' output sorted by memory usage (pressing shift-M). 
I have a 6GB heap, which you can see in the other screenshot, but top 
shows that Solr's resident memory usage is 16GB.  Of that, it says that 
11GB is shared ... but I can't seem to figure out what it's shared *with*.


I think that there's some fibbing going on in the OS.  If you add up the 
"cached" number and the "free" numbers, you get 54302836k.  Since the 
machine only has 64GB of RAM, and if we trust the numbers we added, it's 
not possible for the java process to actually have the 16GB resident 
memory that it claims to have.  Something around 6.5 to 7GB is much more 
realistic, and that's pretty close to the resident minus shared value. 
You can also see that the java process has 54GB of virtual memory, and 
this is approximately equal to the total index size (on this machine) 
plus the java heap size, plus a little extra for Java itself to operate. 
 It's my opinion that when Solr is using an MMAP-based directory (the 
default), the java process shows some incorrect information about how 
much memory it's using.


How much swap is being used on your system, and do you see the OS 
actively using it - swapping pages in and out on a regular basis?  I 
suspect that the amount of swap used is not very much, and that vmstat 
would show that there is very little active swapping occurring.  If I'm 
wrong, then you might be having a real problem.


The default sysctl setting for vm.swappiness on Linux is 60, which means 
it is very likely to swap out a few unused memory pages even when there 
is no actual memory pressure.  On the system that produced the 
screenshots, I have set vm.swappiness to 1 in sysctl.conf, and my swap 
usage never goes above 0k.  My dev server even stays at 0k swap used, 
and it has a 7GB java heap, only 16GB of total RAM, and more than twice 
as much total index size.


Thanks,
Shawn



solr-user@lucene.apache.org

2013-08-27 Thread Utkarsh Sengar
Thanks for the info.

1.
http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn:

{
  "responseHeader":{
"status":0,
"QTime":16,
"params":{
  "debugQuery":"true",
  "indent":"true",
  "q":"o'reilly",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "debug":{
"rawquerystring":"o'reilly",
"querystring":"o'reilly",
"parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly oreilly)\")",
"parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"",
"QParser":"LuceneQParser",
"explain":{}
   }
}



2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this means
tokens are same for "o'reilly"
3. I tried escaping ', it doesn’t help:
http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true

I will add WordDelimiterFilterFactory for index and see if it fixes the
problem.

Thanks,
-Utkarsh



On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson wrote:

> First thing to do is attach &query=debug to your queries and look at the
> parsed output.
>
> Second thing to do is look at the admin/analysis page and see what happens
> at index and query time to things like o'reilly. You have
> WordDelimiterFilterFactory
> configured in your query but not index analysis chain. My bet on that is
> that
> you're getting different tokens at query and index time...
>
> Third thing is that you need to escape the & character. It's probably being
> interpreted as a delimiter on the URL and Solr ignores params it doesn't
> understand.
>
> Best
> Erick
>
>
> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar  >wrote:
>
> > Some of the queries (not all) with special chars return no documents.
> >
> > Example: queries returning no documents
> > q=m&m (this can be explained, when I search for "m m", no documents are
> > returned)
> > q=o'reilly (when I search for "o reilly", I get documents back)
> >
> >
> > Queries returning documents:
> > q=hello&world (document matched is "Hello World: A Life in Ham Radio")
> >
> >
> > My questions are:
> > 1. What's wrong with "o'reilly"? What changes do I need in my field type?
> > 2. How can I make the query "m&m" work?
> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy
> > Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1
> Bag
> > (42 oz)"
> >
> >
> > FIeld type:
> >  > positionIncrementGap="100">
> >  
> >   
> >ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> >   
> >   
> >   
> >class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > 
> > 
> >> generateWordParts="1" generateNumberParts="1"
> >
> > catenateWords="1"
> >
> > catenateNumbers="1"
> >
> > catenateAll="0"
> >
> > preserveOriginal="1"/>
> >   
> >ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> >   
> >   
> >   
> >class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > 
> > 
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>



-- 
Thanks,
-Utkarsh


Solr 4.2 Regular expression, returning only matched substring

2013-08-27 Thread Jai
Hi,

is it possible to get only the matched substring of a text/string type
field in response.
i am trying to search with regular expression and do facet on different
strings (substring of the field) that matches this regular expression.

For example if i write a regular expression to match email, is there any
way to return only the matched email from the indexed sentence, so that i
can do facet on it.

will really appreciate any help.

thanks and regards
jai


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread SandroZbinden
Hey Jack 

Thanks a lot. I just googled for fsync and syncLevel instead of searching in
the solr wiki. Won't happen again.

Here is the link to the solr wiki page that describes to set the syncLevel 

http://wiki.apache.org/solr/SolrCloud?highlight=%28fsync%29



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Transaction-log-on-disk-guarantees-tp4086829p4086867.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread vsilgalis
We have a 2 shard SOLRCloud implementation with 6 servers in production.  We
have allocated 24GB to each server and are using JVM max memory settings of
-Xmx14336 on each of the servers.  We are using the same embedded jetty that
SOLR comes with.  The JVM side of things looks like what I'd expect from
java, climbs until ~13GB and then GC occurs and we are back down to ~4GB
used. 

However, on the original leaders of the cluster we currently are seeing
~22GB resident memory usage for Jetty/SOLR and the machines have begun using
swap, which is something I am concerned with. So the question is should I
expect jetty/SOLR to cleanup after itself in terms of memory usage (outside
of the JVM)? Is there something I'm missing to make things more efficient? 
Anything else I should be looking at? 

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-2-1-High-Resident-Memory-Usage-tp4086866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Jack Krupansky

You missed the wiki update that went by a short while ago:

 

   ${solr.data.dir:}
+   

 

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Tuesday, August 27, 2013 11:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Transaction log "on-disk" guarantees

Here's a blog I wrote up a bit ago:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Hmmm, unfortunately it doesn't say anything about how to set the fsync
option, but do you really care? Soft commits flush to the op system, so
a JVM crash/termination shouldn't affect it anyway. Turning on the fsync
would just be a little bit of extra protection




On Tue, Aug 27, 2013 at 11:43 AM, Sandro Zbinden  wrote:


Hey Mark

Thank you very much for the quick answer. We have a single node
environment.

I try to find the fsync option but was not successful. Ended up in the
UpdateLog class :-)

How do I enable fsync in the solrconfig.xml ?


Besides that:

If solr soft commit feature has a on disk guarantee with a transaction log
why we don't use solr soft commit as the default commit option ?


-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:markrmil...@gmail.com]
Gesendet: Dienstag, 27. August 2013 17:12
An: solr-user@lucene.apache.org
Betreff: Re: Transaction log "on-disk" guarantees


On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:

> Can we activate the transaction log to have on disk guarantees and then
use the solr soft commit feature ?

Yes you can. If you only have a single node (no replication), you probably
want to turn on fsync via the config.

- Mark






Shard splitting error: cannot uncache file="_1.nvm"

2013-08-27 Thread Greg Preston
I haven't been able to successfully split a shard with Solr 4.4.0

If I have an empty index, or all documents would go to one side of the
split, I hit SOLR-5144.  But if I avoid that case, I consistently get
this error:

290391 [qtp243983770-60] INFO
org.apache.solr.update.processor.LogUpdateProcessor  –
[marin_shard1_1_replica1] webapp=/solr path=/update
params={waitSearcher=true&openSearcher=false&commit=true&wt=javabin&commit_end_point=true&version=2&softCommit=false}
{} 0 2
290392 [qtp243983770-60] ERROR org.apache.solr.core.SolrCore  –
java.io.IOException: cannot uncache file="_1.nvm": it was separately
also created in the delegate directory
at 
org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:297)
at 
org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4109)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2809)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)


I've seen LUCENE-4238, but that was closed as a test error.


-Greg


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Erick Erickson
Well, that blog post is, at best, an estimate based on
disk head seek times so take it with a large grain of salt,
I probably shouldn't even have put that in the post.

But for a single node, it's probably not all that noticeable.

Erick


On Tue, Aug 27, 2013 at 12:20 PM, Sandro Zbinden  wrote:

> @Mark Do you know how I can set the syncLevel to fsync in the
> solrconfig.xml  I can't find in the default solrconfig.xml
>
>
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml
>
> The blog posts at
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/says
>  that enabling the fsync is not a big increase in the update time (a
> few milliseconds (say 10-50 ms)). So I think It is useful to turn on fsync.
>
> On Aug 27, 2013, at 11:54 AM, Erick Erickson 
> wrote:
>
> > Soft commits flush to the op system, so a JVM crash/termination
> > shouldn't affect it anyway.
>
> A soft commit is not a hard commit, so there are not guarantees like this.
> It searches committed and non committed segments - non committed segments
> will not magically be committed after a JVM crash.
>
> > Turning on the fsync
> > would just be a little bit of extra protection..
>
> If you don't have replication, it turns on strong 'durability' promises.
> Without it, you are on your own if you have a hard machine reset. If
> durability is important to you and you don't have replication, it's
> important to use the fync option here. Unless you have a great, long time,
> battery backup and/or an env such that hard resets don't concern you for
> some reason. It comes down to your requirments.
>
> Responses to Sandro inline below:
>
> On Aug 27, 2013, at 11:43 AM, Sandro Zbinden  wrote:
>
> > Hey Mark
> >
> > Thank you very much for the quick answer. We have a single node
> environment.
> >
> > I try to find the fsync option but was not successful. Ended up in the
> > UpdateLog class :-)
> >
> > How do I enable fsync in the solrconfig.xml ?
>
> In the updateLog config, its a syncLevel=fsync param.
>
> >
> >
> > Besides that:
> >
> > If solr soft commit feature has a on disk guarantee with a transaction
> log why we don't use solr soft commit as the default commit option ?
>
> Yes, for visibility you should use soft commit. You should also have an
> auto hard commit with openSearcher=false - it's just about flushing the
> transaction log and freeing memory in this configuraiton - which is why it
> makes sense to simply turn on the auto commit for regular hard commits. You
> may or may not want to use auto soft commits.
>
> - Mark
>
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Mark Miller [mailto:markrmil...@gmail.com]
> > Gesendet: Dienstag, 27. August 2013 17:12
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Transaction log "on-disk" guarantees
> >
> >
> > On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:
> >
> >> Can we activate the transaction log to have on disk guarantees and then
> use the solr soft commit feature ?
> >
> > Yes you can. If you only have a single node (no replication), you
> probably want to turn on fsync via the config.
> >
> > - Mark
> >
>
>


AW: Transaction log "on-disk" guarantees

2013-08-27 Thread Sandro Zbinden
@Mark Do you know how I can set the syncLevel to fsync in the solrconfig.xml  I 
can't find in the default solrconfig.xml 

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml

The blog posts at 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
 says that enabling the fsync is not a big increase in the update time (a few 
milliseconds (say 10-50 ms)). So I think It is useful to turn on fsync. 

On Aug 27, 2013, at 11:54 AM, Erick Erickson  wrote:

> Soft commits flush to the op system, so a JVM crash/termination 
> shouldn't affect it anyway.

A soft commit is not a hard commit, so there are not guarantees like this. It 
searches committed and non committed segments - non committed segments will not 
magically be committed after a JVM crash.

> Turning on the fsync
> would just be a little bit of extra protection..

If you don't have replication, it turns on strong 'durability' promises. 
Without it, you are on your own if you have a hard machine reset. If durability 
is important to you and you don't have replication, it's important to use the 
fync option here. Unless you have a great, long time, battery backup and/or an 
env such that hard resets don't concern you for some reason. It comes down to 
your requirments.

Responses to Sandro inline below:

On Aug 27, 2013, at 11:43 AM, Sandro Zbinden  wrote:

> Hey Mark
> 
> Thank you very much for the quick answer. We have a single node environment.
> 
> I try to find the fsync option but was not successful. Ended up in the 
> UpdateLog class :-)
> 
> How do I enable fsync in the solrconfig.xml ?

In the updateLog config, its a syncLevel=fsync param.

> 
> 
> Besides that:
> 
> If solr soft commit feature has a on disk guarantee with a transaction log 
> why we don't use solr soft commit as the default commit option ?

Yes, for visibility you should use soft commit. You should also have an auto 
hard commit with openSearcher=false - it's just about flushing the transaction 
log and freeing memory in this configuraiton - which is why it makes sense to 
simply turn on the auto commit for regular hard commits. You may or may not 
want to use auto soft commits.

- Mark

> 
> 
> -Ursprüngliche Nachricht-
> Von: Mark Miller [mailto:markrmil...@gmail.com]
> Gesendet: Dienstag, 27. August 2013 17:12
> An: solr-user@lucene.apache.org
> Betreff: Re: Transaction log "on-disk" guarantees
> 
> 
> On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:
> 
>> Can we activate the transaction log to have on disk guarantees and then use 
>> the solr soft commit feature ?
> 
> Yes you can. If you only have a single node (no replication), you probably 
> want to turn on fsync via the config.
> 
> - Mark
> 



Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Erick Erickson
I updated the SolrCloud page here: http://wiki.apache.org/solr/SolrCloud to
include how to change this option.

Bh, "Soft commits flush to the op system" should have read "Hard
commits flush the transaction log
to the op system". Blame it on just getting back from the dentist, that was
just completely wrong.

Best,
Erick


On Tue, Aug 27, 2013 at 12:08 PM, Mark Miller  wrote:

> On Aug 27, 2013, at 11:54 AM, Erick Erickson 
> wrote:
>
> > Soft commits flush to the op system, so
> > a JVM crash/termination shouldn't affect it anyway.
>
> A soft commit is not a hard commit, so there are not guarantees like this.
> It searches committed and non committed segments - non committed segments
> will not magically be committed after a JVM crash.
>
> > Turning on the fsync
> > would just be a little bit of extra protection….
>
> If you don't have replication, it turns on strong 'durability' promises.
> Without it, you are on your own if you have a hard machine reset. If
> durability is important to you and you don't have replication, it's
> important to use the fync option here. Unless you have a great, long time,
> battery backup and/or an env such that hard resets don't concern you for
> some reason. It comes down to your requirments.
>
> Responses to Sandro inline below:
>
> On Aug 27, 2013, at 11:43 AM, Sandro Zbinden  wrote:
>
> > Hey Mark
> >
> > Thank you very much for the quick answer. We have a single node
> environment.
> >
> > I try to find the fsync option but was not successful. Ended up in the
> UpdateLog class :-)
> >
> > How do I enable fsync in the solrconfig.xml ?
>
> In the updateLog config, its a syncLevel=fsync param.
>
> >
> >
> > Besides that:
> >
> > If solr soft commit feature has a on disk guarantee with a transaction
> log why we don't use solr soft commit as the default commit option ?
>
> Yes, for visibility you should use soft commit. You should also have an
> auto hard commit with openSearcher=false - it's just about flushing the
> transaction log and freeing memory in this configuraiton - which is why it
> makes sense to simply turn on the auto commit for regular hard commits. You
> may or may not want to use auto soft commits.
>
> - Mark
>
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Mark Miller [mailto:markrmil...@gmail.com]
> > Gesendet: Dienstag, 27. August 2013 17:12
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Transaction log "on-disk" guarantees
> >
> >
> > On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:
> >
> >> Can we activate the transaction log to have on disk guarantees and then
> use the solr soft commit feature ?
> >
> > Yes you can. If you only have a single node (no replication), you
> probably want to turn on fsync via the config.
> >
> > - Mark
> >
>
>


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Mark Miller
On Aug 27, 2013, at 11:54 AM, Erick Erickson  wrote:

> Soft commits flush to the op system, so
> a JVM crash/termination shouldn't affect it anyway. 

A soft commit is not a hard commit, so there are not guarantees like this. It 
searches committed and non committed segments - non committed segments will not 
magically be committed after a JVM crash.

> Turning on the fsync
> would just be a little bit of extra protection….

If you don't have replication, it turns on strong 'durability' promises. 
Without it, you are on your own if you have a hard machine reset. If durability 
is important to you and you don't have replication, it's important to use the 
fync option here. Unless you have a great, long time, battery backup and/or an 
env such that hard resets don't concern you for some reason. It comes down to 
your requirments.

Responses to Sandro inline below:

On Aug 27, 2013, at 11:43 AM, Sandro Zbinden  wrote:

> Hey Mark 
> 
> Thank you very much for the quick answer. We have a single node environment.
> 
> I try to find the fsync option but was not successful. Ended up in the 
> UpdateLog class :-)
> 
> How do I enable fsync in the solrconfig.xml ?

In the updateLog config, its a syncLevel=fsync param.

> 
> 
> Besides that:
> 
> If solr soft commit feature has a on disk guarantee with a transaction log 
> why we don't use solr soft commit as the default commit option ?

Yes, for visibility you should use soft commit. You should also have an auto 
hard commit with openSearcher=false - it's just about flushing the transaction 
log and freeing memory in this configuraiton - which is why it makes sense to 
simply turn on the auto commit for regular hard commits. You may or may not 
want to use auto soft commits.

- Mark

> 
> 
> -Ursprüngliche Nachricht-
> Von: Mark Miller [mailto:markrmil...@gmail.com] 
> Gesendet: Dienstag, 27. August 2013 17:12
> An: solr-user@lucene.apache.org
> Betreff: Re: Transaction log "on-disk" guarantees
> 
> 
> On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:
> 
>> Can we activate the transaction log to have on disk guarantees and then use 
>> the solr soft commit feature ?
> 
> Yes you can. If you only have a single node (no replication), you probably 
> want to turn on fsync via the config.
> 
> - Mark
> 



Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Erick Erickson
Here's a blog I wrote up a bit ago:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Hmmm, unfortunately it doesn't say anything about how to set the fsync
option, but do you really care? Soft commits flush to the op system, so
a JVM crash/termination shouldn't affect it anyway. Turning on the fsync
would just be a little bit of extra protection




On Tue, Aug 27, 2013 at 11:43 AM, Sandro Zbinden  wrote:

> Hey Mark
>
> Thank you very much for the quick answer. We have a single node
> environment.
>
> I try to find the fsync option but was not successful. Ended up in the
> UpdateLog class :-)
>
> How do I enable fsync in the solrconfig.xml ?
>
>
> Besides that:
>
> If solr soft commit feature has a on disk guarantee with a transaction log
> why we don't use solr soft commit as the default commit option ?
>
>
> -Ursprüngliche Nachricht-
> Von: Mark Miller [mailto:markrmil...@gmail.com]
> Gesendet: Dienstag, 27. August 2013 17:12
> An: solr-user@lucene.apache.org
> Betreff: Re: Transaction log "on-disk" guarantees
>
>
> On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:
>
> > Can we activate the transaction log to have on disk guarantees and then
> use the solr soft commit feature ?
>
> Yes you can. If you only have a single node (no replication), you probably
> want to turn on fsync via the config.
>
> - Mark
>
>


AW: Transaction log "on-disk" guarantees

2013-08-27 Thread Sandro Zbinden
Hey Mark 

Thank you very much for the quick answer. We have a single node environment.

I try to find the fsync option but was not successful. Ended up in the 
UpdateLog class :-)

How do I enable fsync in the solrconfig.xml ?


Besides that:

If solr soft commit feature has a on disk guarantee with a transaction log why 
we don't use solr soft commit as the default commit option ?


-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:markrmil...@gmail.com] 
Gesendet: Dienstag, 27. August 2013 17:12
An: solr-user@lucene.apache.org
Betreff: Re: Transaction log "on-disk" guarantees


On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:

> Can we activate the transaction log to have on disk guarantees and then use 
> the solr soft commit feature ?

Yes you can. If you only have a single node (no replication), you probably want 
to turn on fsync via the config.

- Mark



Re: Multiple replicas for specific shard

2013-08-27 Thread Keith Duntz
I think you could do this by specifying the shard id of each core in
solr.xml.  Something like...




  






  


Cheers,
Keith


On Tue, Aug 27, 2013 at 10:01 AM, maephisto  wrote:

> Hi!
>
> Imagine the following configuration: a SolrCloud cluster, with 3 shards, a
> replication factor of 2 and 6 nodes.
> Now, if i'll add one more node to the cluster ZK will automatically assign
> a
> shard replica to it.
>
> My question is, can i influence which of the shards to be replicated on the
> new node? Can I have 5 replicas for one shard and just 1 for the others ?
>
> Thank you!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multiple-replicas-for-specific-shard-tp4086828.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Multiple replicas for specific shard

2013-08-27 Thread maephisto
Hi!

Imagine the following configuration: a SolrCloud cluster, with 3 shards, a
replication factor of 2 and 6 nodes. 
Now, if i'll add one more node to the cluster ZK will automatically assign a
shard replica to it.

My question is, can i influence which of the shards to be replicated on the
new node? Can I have 5 replicas for one shard and just 1 for the others ?

Thank you!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-replicas-for-specific-shard-tp4086828.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Transaction log "on-disk" guarantees

2013-08-27 Thread Mark Miller

On Aug 27, 2013, at 11:08 AM, Sandro Zbinden  wrote:

> Can we activate the transaction log to have on disk guarantees and then use 
> the solr soft commit feature ?

Yes you can. If you only have a single node (no replication), you probably want 
to turn on fsync via the config.

- Mark



Transaction log "on-disk" guarantees

2013-08-27 Thread Sandro Zbinden
Dear solr users

We are using the solr soft comit feature and we are worried about what happens 
after we restart the solr server.

Can we activate the transaction log to have on disk guarantees and then use the 
solr soft commit feature ?

Thanks and Best regards

Sandro Zbinden


[imagic]Sandro Zbinden
Software Engineer




Re: Solr cloud hash range set to null after recovery from index corruption

2013-08-27 Thread Rikke Willer

Hi again,

a follow-up on this: I ended up fixing it by uploading a new version of 
clusterstate.json to Zookeeper with the missing hash ranges set (they were 
easily deducible since they were sorted by shard name).
I still don't know what the correct solution to handle index corruption (where 
all replicas of a shard needs to be repaired) while still keeping the cloud 
available for search, would be?

Thanks,

Rikke

On Aug 22, 2013, at 21:27 , Rikke Willer 
mailto:r...@dtic.dtu.dk>> wrote:


Hi,

I have a Solr cloud set up with 12 shards with 2 replicas each, divided on 6 
servers (each server hosting 4 cores). Solr version is 4.3.1.
Due to memory errors on one machine, 3 of its 4 indexes became corrupted. I 
unloaded the cores, repaired the indexes with the Lucene CheckIndex tool, and 
added the cores again.
Afterwards the Solr cloud hash range has been set to null for the shards with 
corrupt indexes.
Could anybody point me to why this has occured, and more importantly, how to 
set the range on the shards again?
Thank you.

Best,

Rikke



Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-27 Thread Bill Bell
Index and query

analyzer type="index">

Bill Bell
Sent from mobile


On Aug 26, 2013, at 5:42 AM, skorrapa  wrote:

> I have also re indexed the data and tried. And also tried with the belowl
>   sortMissingLast="true" omitNorms="true">
>  
>
>
>  
>
>
>
>  
>
>
>
>  
>
> This didnt work as well...
> 
> 
> 
> On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] <
> ml-node+s472066n4086601...@n3.nabble.com> wrote:
> 
>> Hello All,
>> 
>> I am still facing the same issue. Case insensitive search isnot working on
>> Solr 4.3
>> I am using the below configurations in schema.xml
>> > sortMissingLast="true" omitNorms="true">
>>  
>>
>>
>>  
>>
>>
>>
>>  
>>
>>
>>
>>  
>>
>> Basically I want my string which could have spaces or characters like '-'
>> or \ to be searched upon case insensitively.
>> Please help.
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
>> To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
>> here
>> .
>> NAML
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concat 2 fields in another field

2013-08-27 Thread Bill Bell
If for search just copyField into a multivalued field

Or do it on indexing using DIH or code. A rhino script works too.

Bill Bell
Sent from mobile


On Aug 27, 2013, at 7:15 AM, "Jack Krupansky"  wrote:

> I have additional examples in the two most recent early access releases of my 
> book - variations on using the existing update processors.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Federico Chiacchiaretta
> Sent: Tuesday, August 27, 2013 8:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Concat 2 fields in another field
> 
> Hi,
> we do the same thing using an update request processor chain, this is the
> snippet from solrconfig.xml
> 
> 
>  > firstname concatfield   class="solr.CloneFieldUpdateProcessorFactory"> lastname str> concatfield   "solr.ConcatFieldUpdateProcessorFactory"> concatfield
>  _ 
>   "solr.RunUpdateProcessorFactory" />
> 
> 
> 
> Regards,
> Federico Chiacchiaretta
> 
> 
> 
> 2013/8/27 Markus Jelsma 
> 
>> You may be more interested in the ConcatFieldUpdateProcessorFactory:
>> 
>> http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
>> 
>> 
>> 
>> -Original message-
>> > From:Alok Bhandari 
>> > Sent: Tuesday 27th August 2013 14:05
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Concat 2 fields in another field
>> >
>> > Thanks for reply.
>> >
>> > But I don't want to introduce any scripting in my code so want to know > is
>> > there any Java component available for the same.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
> 


Re: Magento solr Invalid Date String:'false'

2013-08-27 Thread Jack Krupansky

"Invalid Date String:'false'"

That's correct, "false" is not a valid date in Solr. Solr uses ISO format: 
-MM-DDThh:mm:ss[.ttt]Z.


You obviously have some issue with whatever software is feeding data into 
Solr. Nothing we can do to help you there, other than to tell you to make 
sure you feed clean data into Solr.


I suspect this field is a dynamic field (a  element with the 
pattern "*_datetime"). Nothing wrong with that - just make sure you only 
populate the field with valid date data.


-- Jack Krupansky

-Original Message- 
From: Nikesh12

Sent: Tuesday, August 27, 2013 1:38 AM
To: solr-user@lucene.apache.org
Subject: Magento solr Invalid Date String:'false'

We are getting below message during solr indexing running by cron setting in
magento.


Aug 12, 2013 8:06:15 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[24P1602]} 0 1
Aug 12, 2013 8:06:16 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=24P1602] Error
adding field 'lepubdate_datetime'='false'
   at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
   at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
   at
org.apache.solr.handler.JsonLoader.processUpdate(JsonLoader.java:100)
   at org.apache.solr.handler.JsonLoader.load(JsonLoader.java:75)
   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
   at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
   at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
   at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
   at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Invalid Date String:'false'
   at org.apache.solr.schema.DateField.parseMath(DateField.java:161)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:419)
   at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
   at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
   at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
   at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
   ... 22 more

Aug 12, 2013 8:06:16 AM org.apache.solr.core.SolrCore execute
=

Best to post to the solr-user list rather than general, but looks like
you've got a type mismatch:

   'lepubdate_datetime'='false'

What type is lepubdate_datetime?   I'm guessing it's a "date" type and
shouldn't be getting the value 'false' :)

   Erik

===

Hi Eric,

Can you please let me know where should i look to correct the issue. In
database i have found that there is "lepubdate" field in "eav_attribute"
table with "backend_type as datetime". But there are not any field such as
'lepubdate_datetime' in database. but solr giving
'lepubdate_datetime'='false' error in his log.


Thanks
Nikesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Magento-solr-Invalid-Date-String-false-tp4086747.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Concat 2 fields in another field

2013-08-27 Thread Jack Krupansky
I have additional examples in the two most recent early access releases of 
my book - variations on using the existing update processors.


-- Jack Krupansky

-Original Message- 
From: Federico Chiacchiaretta

Sent: Tuesday, August 27, 2013 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Concat 2 fields in another field

Hi,
we do the same thing using an update request processor chain, this is the
snippet from solrconfig.xml


 
firstname concatfield  
class="solr.CloneFieldUpdateProcessorFactory"> lastname concatfield   concatfield
 _ 
 



Regards,
Federico Chiacchiaretta



2013/8/27 Markus Jelsma 


You may be more interested in the ConcatFieldUpdateProcessorFactory:

http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html



-Original message-
> From:Alok Bhandari 
> Sent: Tuesday 27th August 2013 14:05
> To: solr-user@lucene.apache.org
> Subject: Re: Concat 2 fields in another field
>
> Thanks for reply.
>
> But I don't want to introduce any scripting in my code so want to know 
> is

> there any Java component available for the same.
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>





Re: Concat 2 fields in another field

2013-08-27 Thread Federico Chiacchiaretta
Hi,
we do the same thing using an update request processor chain, this is the
snippet from solrconfig.xml


 firstname concatfield   lastname concatfield   concatfield
 _ 
 



Regards,
Federico Chiacchiaretta



2013/8/27 Markus Jelsma 

> You may be more interested in the ConcatFieldUpdateProcessorFactory:
>
> http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
>
>
>
> -Original message-
> > From:Alok Bhandari 
> > Sent: Tuesday 27th August 2013 14:05
> > To: solr-user@lucene.apache.org
> > Subject: Re: Concat 2 fields in another field
> >
> > Thanks for reply.
> >
> > But I don't want to introduce any scripting in my code so want to know is
> > there any Java component available for the same.
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


RE: Concat 2 fields in another field

2013-08-27 Thread Markus Jelsma
You may be more interested in the ConcatFieldUpdateProcessorFactory:
http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

 
 
-Original message-
> From:Alok Bhandari 
> Sent: Tuesday 27th August 2013 14:05
> To: solr-user@lucene.apache.org
> Subject: Re: Concat 2 fields in another field
> 
> Thanks for reply.
> 
> But I don't want to introduce any scripting in my code so want to know is
> there any Java component available for the same.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Concat 2 fields in another field

2013-08-27 Thread Alok Bhandari
Thanks for reply.

But I don't want to introduce any scripting in my code so want to know is
there any Java component available for the same.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concat 2 fields in another field

2013-08-27 Thread Rafał Kuć
Hello!

You don't have to write custom component - you can use
ScriptUpdateProcessor - http://wiki.apache.org/solr/ScriptUpdateProcessor

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

> Hello all ,

> I am using solr 4.x , I have a requirement where I need to have a field
> which holds data from 2 fields concatenated using _. So for example I have 2
> fields firstName and lastName , I want a third field which should hold
> firstName_lastName. Is there any existing concatenating component available
> or I need to write a custom updateProcessor which does this task. By the way
> need for having this third field is that I want to group on the
> firstname,lastName but as grouping does not support multiple fields to form
> single group I am using this trick. Hope I am clear . 

> Thanks .




> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Concat 2 fields in another field

2013-08-27 Thread Alok Bhandari
Hello all ,

I am using solr 4.x , I have a requirement where I need to have a field
which holds data from 2 fields concatenated using _. So for example I have 2
fields firstName and lastName , I want a third field which should hold
firstName_lastName. Is there any existing concatenating component available
or I need to write a custom updateProcessor which does this task. By the way
need for having this third field is that I want to group on the
firstname,lastName but as grouping does not support multiple fields to form
single group I am using this trick. Hope I am clear . 

Thanks .




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: no "timing" when no result in distributed mode

2013-08-27 Thread Elodie Sannier

Hello,

I'm using the 4.4.0 version but I still have the problem.
Should I create a JIRA issue for it ?

Elodie

On 06/21/2013 02:54 PM, Elodie Sannier wrote:

Hello,

I am using SolrCloud 4.2.1 with two shards, with the "debugQuery=true" parameter, when a 
query does not return documents then the "timing" debug information is not returned:
curl -sS "http://localhost:8983/solr/select?q=dummy&debugQuery=true"; | grep -o '.*'

If i use the "distrib=false" parameter, the "timing" debug information is 
returned:
curl -sS "http://localhost:8983/solr/select?q=dummy&debugQuery=true&distrib=false"; |  grep -o 
'.*'
1.00.00.00.00.00.00.00.01.00.00.00.00.00.01.0*

Is it a bug of the distributed mode ?


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Can we used CloudSolrServer for searching data

2013-08-27 Thread Dharmendra Jaiswal
Hello,

I am using multi-core mechnism with Solr4.4.0. And each core is dedicated to
a
particular client (each core is a collection)

Like If we search data from SiteA, it will provide search result from CoreA
And if we search data from SiteB, it will provide search result from CoreB
and similar case with other client.

Right now i am using HttpSolrServer (SolrJ API) for connecting with Solr for
search.
As per my understanding it will try to connect directly to a particular Solr
instance for searching and if that node will be down searching will fail.
please let me know if my assumption is wrong.

My query is that is it possible to connect with Solr using CloudSolrServer
instead of HTTPSolrServer for searching. so that in case one node will be
down cloud solr server will pick data from other instance of Solr.

Any pointer and link will be helpful. it will be better if some one shared
me some example related to connection using ClouSolrServer.

Note: I am Using Windows machine for deployment of Solr. And we are indexing
data from database using DIH

Thanks,
Dharmendra jaiswal 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-we-used-CloudSolrServer-for-searching-data-tp4086766.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2013-08-27 Thread zhaoxin
Thanks, Shawn ! it's ok now



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-org-apache-solr-update-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Err-tp4086576p4086763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-27 Thread Bernd Fehling
Years ago when "Federated Search" was a buzzword we did some development and
testing with Lucene, FAST Search, Google and several other Search Engines
according Federated Search in Library context.
The results can be found here
http://pub.uni-bielefeld.de/download/2516631/2516644
Some minor parts are in German most is written in English.
It also gives you an idea where to keep an eye on, where are the pitfalls
and so on.
We also had a tool called "unity" (written in Python) which did Federated
Search on any Search Engine and
Database, like Google, Gigablast, FAST, Lucene, ...
The trick with Federated Search is to combine the results.
We offered three options to the users search surface:
- RoundRobin
- Relevancy
- PseudoRandom





--
View this message in context: 
http://lucene.472066.n3.nabble.com/More-on-topic-of-Meta-search-Federated-Search-with-Solr-tp4085167p4086759.html
Sent from the Solr - User mailing list archive at Nabble.com.