Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
I'm not sure how these hash ranges were determined, so I'm not sure if I
should be manually setting them or somehow allowing solr to pick them for
this shard.

Thanks,
Aki



On Thu, Feb 4, 2016 at 4:12 PM, Aki Balogh  wrote:

> Shawn,
>
> Thanks - this is very helpful.
>
> I found the state.json file and it indeed shows that the range for shard1
> is null.
>
> In order to fix, do I need to upload a corrected state.json file with
> corrected hash ranges?   How can I do that? (zkcli.sh?)
>
> Thanks,
> Aki
>
>
> On Thu, Feb 4, 2016 at 4:10 PM, Shawn Heisey  wrote:
>
>> On 2/4/2016 1:37 PM, Aki Balogh wrote:
>> > Specifically, they suggest getting clusterstate.json.  But I've tried
>> that
>> > and when I get that file, I only get an empty file {}
>> >
>> > Is there another way to ask Zookeeper to cover the missing hash range?
>>
>> Solr 5.x changed how the clusterstate is managed.  The
>> /clusterstate.json file is empty, just as you have noticed.  You'll find
>> the actual clusterstate inside each collection path, in a file named
>> "state.json".
>>
>> Here's what it will look like in the Cloud->Tree section of the admin UI:
>>
>> https://www.dropbox.com/s/1964mnkuds1uh3d/solr5-state.json.png?dl=0
>> 
>>
>> If any of those guides you found are hosted on an official apache.org
>> 
>> website (and do not explicitly mention the 4.x version they apply to),
>> then that's our job to update, but third-party information will be the
>> responsibility of the person who posted it.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
Shawn,

Thanks - this is very helpful.

I found the state.json file and it indeed shows that the range for shard1
is null.

In order to fix, do I need to upload a corrected state.json file with
corrected hash ranges?   How can I do that? (zkcli.sh?)

Thanks,
Aki


On Thu, Feb 4, 2016 at 4:10 PM, Shawn Heisey  wrote:

> On 2/4/2016 1:37 PM, Aki Balogh wrote:
> > Specifically, they suggest getting clusterstate.json.  But I've tried
> that
> > and when I get that file, I only get an empty file {}
> >
> > Is there another way to ask Zookeeper to cover the missing hash range?
>
> Solr 5.x changed how the clusterstate is managed.  The
> /clusterstate.json file is empty, just as you have noticed.  You'll find
> the actual clusterstate inside each collection path, in a file named
> "state.json".
>
> Here's what it will look like in the Cloud->Tree section of the admin UI:
>
> https://www.dropbox.com/s/1964mnkuds1uh3d/solr5-state.json.png?dl=0
> 
>
> If any of those guides you found are hosted on an official apache.org
> 
> website (and do not explicitly mention the 4.x version they apply to),
> then that's our job to update, but third-party information will be the
> responsibility of the person who posted it.
>
> Thanks,
> Shawn
>
>


Change in EXPLAIN info since Solr 5

2016-02-04 Thread Burgmans, Tom
Hi group, 

While exploring Solr 5.4.0, I noticed a subtle difference in the EXPLAIN debug 
information, compared to the version we currently use (4.10.1).

Solr 4.10.1:

2.0739748 = (MATCH) max plus 1.0 times others of:
  2.0739748 = (MATCH) weight(text:test in 30) [DefaultSimilarity], result of:
2.0739748 = score(doc=30,freq=3.0), product of:
  0.3556181 = queryWeight, product of:
3.3671236 = idf(docFreq=17, maxDocs=192)
0.105614804 = queryNorm
  5.832029 = fieldWeight in 30, product of:
1.7320508 = tf(freq=3.0), with freq of:
  3.0 = termFreq=3.0
3.3671236 = idf(docFreq=17, maxDocs=192)
1.0 = fieldNorm(doc=30)

Solr 5.4.0:

2.0739748 = max plus 1.0 times others of:
  2.0739748 = weight(text:test in 30) [ClassicSimilarity], result of:
2.0739748 = score(doc=30,freq=3.0), product of:
  0.3556181 = queryWeight, product of:
3.3671236 = idf(docFreq=17, maxDocs=192)
0.105614804 = queryNorm
  5.832029 = fieldWeight in 30, product of:
1.7320508 = tf(freq=3.0), with freq of:
  3.0 = termFreq=3.0
3.3671236 = idf(docFreq=17, maxDocs=192)
1.0 = fieldNorm(doc=30)

The difference is the removal of (MATCH) in some of the EXPLAIN lines. That is 
causing issues for us since we have developed an EXPLAIN parser that leans on 
the presence of (MATCH) in the EXPLAIN.
Does anyone have a suggestion how to insert back (MATCH) in the explain info 
(like which file should we patch)?

Thanks, Tom


Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Shawn Heisey
On 2/4/2016 1:37 PM, Aki Balogh wrote:
> Specifically, they suggest getting clusterstate.json.  But I've tried that
> and when I get that file, I only get an empty file {}
>
> Is there another way to ask Zookeeper to cover the missing hash range?

Solr 5.x changed how the clusterstate is managed.  The
/clusterstate.json file is empty, just as you have noticed.  You'll find
the actual clusterstate inside each collection path, in a file named
"state.json".

Here's what it will look like in the Cloud->Tree section of the admin UI:

https://www.dropbox.com/s/1964mnkuds1uh3d/solr5-state.json.png?dl=0

If any of those guides you found are hosted on an official apache.org
website (and do not explicitly mention the 4.x version they apply to),
then that's our job to update, but third-party information will be the
responsibility of the person who posted it.

Thanks,
Shawn



Re: Errors During Load Test

2016-02-04 Thread Toke Eskildsen
Tiwari, Shailendra  wrote:
> We are on Solr 4.10.3. Got 2 load balanced RedHat with 16 GB
> memory on each. Memory assigned to JVM 4 GB, 2 Shards, 
> total docs 60 K, and 4 replicas.

As you are chasing throughput, you should aim to lower the overall resources 
needed for a single request, potentially at the cost of latency. Unless you 
have really large documents, very special queries or something else making this 
an outlier, you will be much better off with just 1 shard and 2 replicas. 
Having more that 1 shard introduces an overhead for each request and for such a 
small setup it is relatively large. 

- Toke Eskildsen


Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Erick Erickson
Hash ranges should have been assigned automatically when you
created the collection unless you created the collection with the implicit
router. What was the command you used to create the collection?

Best,
Erick

On Thu, Feb 4, 2016 at 1:21 PM, Aki Balogh  wrote:
> I'm not sure how these hash ranges were determined, so I'm not sure if I
> should be manually setting them or somehow allowing solr to pick them for
> this shard.
>
> Thanks,
> Aki
>
>
>
> On Thu, Feb 4, 2016 at 4:12 PM, Aki Balogh  wrote:
>
>> Shawn,
>>
>> Thanks - this is very helpful.
>>
>> I found the state.json file and it indeed shows that the range for shard1
>> is null.
>>
>> In order to fix, do I need to upload a corrected state.json file with
>> corrected hash ranges?   How can I do that? (zkcli.sh?)
>>
>> Thanks,
>> Aki
>>
>>
>> On Thu, Feb 4, 2016 at 4:10 PM, Shawn Heisey  wrote:
>>
>>> On 2/4/2016 1:37 PM, Aki Balogh wrote:
>>> > Specifically, they suggest getting clusterstate.json.  But I've tried
>>> that
>>> > and when I get that file, I only get an empty file {}
>>> >
>>> > Is there another way to ask Zookeeper to cover the missing hash range?
>>>
>>> Solr 5.x changed how the clusterstate is managed.  The
>>> /clusterstate.json file is empty, just as you have noticed.  You'll find
>>> the actual clusterstate inside each collection path, in a file named
>>> "state.json".
>>>
>>> Here's what it will look like in the Cloud->Tree section of the admin UI:
>>>
>>> https://www.dropbox.com/s/1964mnkuds1uh3d/solr5-state.json.png?dl=0
>>> 
>>>
>>> If any of those guides you found are hosted on an official apache.org
>>> 
>>> website (and do not explicitly mention the 4.x version they apply to),
>>> then that's our job to update, but third-party information will be the
>>> responsibility of the person who posted it.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>


AW: Hard commits, soft commits and transaction logs

2016-02-04 Thread Clemens Wyss DEV
Thanks Erick.
I guess I'll go the 3>-way, i.e. optimize the index "whenever appropriate". 
Could I alternatively ("whenever appropriate") issue a 
'/suggest?spellcheck.build=true'-request?

> bq: Suggestions are re-built on commit
Agree. That was for unitTesting purposes only.
In production we have  true

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Donnerstag, 4. Februar 2016 19:39
An: solr-user
Betreff: Re: Hard commits, soft commits and transaction logs

bq: and suggestions of deleted docs are...

OK, this is something different than I read the first time. I'm assuming that 
when you mention suggestions, you're using one of the suggesters that works off 
the indexed terms, which will include data from deleted docs. There's really 
not a good mechanism other than getting all the data associated with deleted 
documents out of there that I know of in that scenario. What people have done:

1> Just lived with it. On a reasonably large corpus, the number of 
1> suggestions
that aren't actually in a live document is often very small, small enough to 
ignore. In this case you might be seeing something because of your tests that 
makes no practical difference.

I'll add parenthetically that users will get empty results even if all the 
terms suggested are in "live" docs assuming they, say, add filter queries. 
Imagine a filter query restricting the returns to docs dated yesterday and 
suggestions come back from docs dated 5 days ago.

2> Curate the suggestions. In this scenario there's a fixed list of 
2> terms in a
text file that you suggest from.

3> Optimize the index. This is usually only really acceptable for setups 
3> where
the index changes infrequently (e.g nightly or something) which doesn't sound 
like it fits your scenario at all.

bq: Suggestions are re-built on commit

I'm going to go out on a limb and say that this is likely to not work at all 
for production in a NRT setup. This will take far too much time on a 
significantly-sized corpus to be feasible. At least that's my fear, I'm mostly 
advising you to check this before even trying to scale up.

Best,
Erick

On Wed, Feb 3, 2016 at 11:07 PM, Clemens Wyss DEV  wrote:
> Sorry for coming back to this topic:
> You (Erick) mention "By and large, do not issue commits from any client 
> indexing to Solr"
>
> In ordert o achieve NRT, I for example test
>  
>18 
>true
>  
>   
> 1 
>   
>
> For (unit)testing purposes
>  
>1000 
>true
>  
>   
> 500 
>   
>
> Suggestions are re-built on commit
> ...
> true
> ...
>
> (Almost) all my unit tests pass. Except for my docDeletion-test: it looks 
> like expungeDeletes is never "issued" and suggestions of deleted docs are 
> returned.
> When I explicitly issue an "expunging-soft-commit"
>
> UpdateRequest rq = new UpdateRequest(); rq.setAction( 
> UpdateRequest.ACTION.COMMIT, false, false, 100, true, true ); 
> rq.process( solrClient );
>
> the test passes and no false suggestions are returned. What am I facing?
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Montag, 4. Januar 2016 17:36
> An: solr-user
> Betreff: Re: Hard commits, soft commits and transaction logs
>
> As far as I know. If you see anything different, let me know and we'll see if 
> we can update it.
>
> Best,
> Erick
>
> On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV  wrote:
>> [Happy New Year to all]
>>
>> Is all herein
>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs
>> -
>> softcommit-and-commit-in-sorlcloud/
>> mentioned/recommended still valid for Solr 5.x?
>>
>> - Clemens


Re: ​Securing fields and documents with Shield | Elastic

2016-02-04 Thread Alexandre Rafalovitch
I have not used Shield yet, so this is based just on the document you sent.

I would use different Request Handler endpoints for different users
and put the restrictions there, in the invariants section.

For field restrictions, I would use 'uf' parameter. As for example
here (from my old book):
https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml#L24

For document restrictions, it just seems like an extra 'fq' query to
filter out the documents. Or a post-filter.

The only question is how to route to the relevant endpoint and that
can be done in the middle-ware or possibly by one of the plugin
components for Solr.  Or use ManifoldCF, as per the wiki page on the
topic: https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

Does this fulfill your requirements?

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 5 February 2016 at 12:08, Philip Durbin  wrote:
> Does Solr have anything like this?
>
> https://www.elastic.co/blog/securing-fields-and-documents-with-shield
>
> Or is it on the roadmap?


Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Shawn Heisey
On 2/4/2016 2:12 PM, Aki Balogh wrote:
> I found the state.json file and it indeed shows that the range for shard1
> is null.
>
> In order to fix, do I need to upload a corrected state.json file with
> corrected hash ranges?   How can I do that? (zkcli.sh?)

The easiest way to figure out the correct hash range, if you do not know
it and cannot figure it out by looking at the other hash ranges, is to
create a new collection with the same number of shards as the broken
collection, then look at the clusterstate for that collection to see
what the hash ranges are, to determine which range is missing.  Then
once you have fixed the existing collection, delete the new collection.

You can upload the changed state.json file with zkcli.sh, I think the
command is "putfile", and you will need the full path within zookeeper. 
Alternately, you can get a GUI zookeeper client for IDEs like Eclipse
and IntelliJ IDEA for a more interactive experience.  Reloading the
collection after replacing the state.json file is probably required.

Thanks,
Shawn



hitratio vs cumulative_hitratio

2016-02-04 Thread davidphilip cherian
Solr caching : What does it mean have lookup=0, hits=0 and hitratio=0 but
cumulative_hitratio=0.75 and cumulative_lookups >100,000 with
cumulative_inserts >20k and cumulative_evictions =0,  maxSize of cache
objects is 512


Re: Change in EXPLAIN info since Solr 5

2016-02-04 Thread Shawn Heisey
On 2/4/2016 2:54 PM, Burgmans, Tom wrote:
> While exploring Solr 5.4.0, I noticed a subtle difference in the EXPLAIN 
> debug information, compared to the version we currently use (4.10.1).

> The difference is the removal of (MATCH) in some of the EXPLAIN lines. That 
> is causing issues for us since we have developed an EXPLAIN parser that leans 
> on the presence of (MATCH) in the EXPLAIN.
> Does anyone have a suggestion how to insert back (MATCH) in the explain info 
> (like which file should we patch)?
>

This was removed in Lucene 5.2, in an effort to simplify the explanation
API.  That information was redundant, apparently.  Here's the reasoning
used by the committer that removed it:

https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14507098#comment-14507098
 

Thanks,
Shawn



​Securing fields and documents with Shield | Elastic

2016-02-04 Thread Philip Durbin
Does Solr have anything like this?

https://www.elastic.co/blog/securing-fields-and-documents-with-shield

Or is it on the roadmap?


RE: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-04 Thread Gian Maria Ricci - aka Alkampfer
I've already found these two presentation, sadly enough link for source code is 
broken, it seems that the domain www.searchbox.com is completely down :|

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Binoy Dalal [mailto:binoydala...@gmail.com] 
Sent: mercoledì 3 febbraio 2016 17:46
To: solr-user@lucene.apache.org
Subject: Re: Tutorial or Code Samples to explain how to Write Solr Plugins

Here's a couple of links you can follow to get started:
https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
These are to write a search component and a request handler respectively.
They are on older solr versions but they should work with 5.x as well.
I used these to get started when I was trying to write my first plugin.
Once you get a hang of how it's to be done it's really not that difficult.

On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer < 
alkamp...@nablasoft.com> wrote:

> Hi,
>
>
>
> I wonder if there is some code samples or tutorial (updated to work 
> with version 5) to help users writing plugins.
>
>
>
> I’ve found lots of difficulties on the past to find such kind of 
> information when I needed to write some plugins, and I wonder if I 
> missed some site or link that does a better explanation than official 
> page http://wiki.apache.org/solr/SolrPlugins that is really old.
>
>
>
> Thanks in advance.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
> VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d
> -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrm
> GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBcl
> KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gpc
> IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT
> =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX
> 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d
> -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn
> 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s
> 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>
--
Regards,
Binoy Dalal


Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-04 Thread Shawn Heisey
On 2/4/2016 7:29 AM, Shahzad Masud wrote:
> Q: Is this a normal that one node support one shard in Jetty?
> Q: Can anyone point to appropriate guideline; if jetty is better than
> tomcat?
> Q: Have anyone else experienced similar migration, and concluded that
> tomcat is better.

Solr 5.x is more difficult to put into Tomcat than 4.x was -- there is
no .war file in the download at all as of version 5.3.  It can still be
done, but we strongly recommend using Solr as it is shipped, with Jetty.

https://wiki.apache.org/solr/WhyNoWar

The recommendation is to use the Jetty that comes with Solr, not a
separate Jetty package.  I would not be too surprised to learn that
Tomcat is better than a separate Jetty package, but in that case, both
of them have no tuning.  The jetty that comes with Solr is tuned for
Solr.  The most important part of that tuning is the maxThreads setting
-- the default value of 200 in Tomcat and Jetty is easy to exceed ...
and when the container starts limiting the number of threads,
performance *will* suffer.

No matter where the Jetty comes from, there is *NOT* a limitation of one
shard per node with Jetty.  Where did you hear that?  Whatever resource
you are looking at which states this is wrong, and I'd like to get it
corrected.  I personally am running Solr installs (both 4.x and 5.x) on
Jetty which have dozens of cores (shards).

FYI -- SolrCloud fully supports sharded indexes.  Sharding is often the
entire point of using SolrCloud.  Sharded indexes are easier to manage
in SolrCloud than they are in standalone mode -- shard handling for both
indexing and queries is fully automated.

Thanks,
Shawn



RE: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-04 Thread Gian Maria Ricci - aka Alkampfer
Thanks to everyone for the really useful links. The problem is that, googling 
around does not produces really good results. In the past when I wrote my first 
plugin it was a real pain :).

Thanks.

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: mercoledì 3 febbraio 2016 23:43
To: solr-user 
Subject: Re: Tutorial or Code Samples to explain how to Write Solr Plugins

There is a framework to help write them:
https://github.com/leonardofoderaro/alba

Also, some recent plugins were released at the Revolution conference, maybe 
they have something useful:
https://github.com/DiceTechJobs/SolrPlugins

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 February 2016 at 08:25, Charlie Hull  wrote:
> Here's one we wrote recently for indexing ontologies with Solr as part 
> of the BioSolr project:
> https://github.com/flaxsearch/BioSolr/tree/master/ontology/solr and a 
> presentation on how it works (explained in the second half of the 
> talk) https://www.youtube.com/watch?v=v1qKNX_axdI - hope this helps!
>
> Cheers
>
> Charlie
>
> On 3 February 2016 at 18:45, Upayavira  wrote:
>
>> Not a tutorial as such, but here's some simple infrastructure for 
>> building Solr components alongside Solr:
>>
>> https://github.com/upayavira/custom-solr-components
>>
>> I suspect you're past that stage already though.
>>
>> Upayavira
>>
>> On Wed, Feb 3, 2016, at 04:45 PM, Binoy Dalal wrote:
>> > Here's a couple of links you can follow to get started:
>> >
>> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin
>> -a-solr-search-component-plugin
>> >
>> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-reques
>> t-handler-plugin
>> > These are to write a search component and a request handler respectively.
>> > They are on older solr versions but they should work with 5.x as well.
>> > I used these to get started when I was trying to write my first plugin.
>> > Once you get a hang of how it's to be done it's really not that 
>> > difficult.
>> >
>> > On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer < 
>> > alkamp...@nablasoft.com> wrote:
>> >
>> > > Hi,
>> > >
>> > >
>> > >
>> > > I wonder if there is some code samples or tutorial (updated to 
>> > > work
>> with
>> > > version 5) to help users writing plugins.
>> > >
>> > >
>> > >
>> > > I’ve found lots of difficulties on the past to find such kind of 
>> > > information when I needed to write some plugins, and I wonder if 
>> > > I
>> missed
>> > > some site or link that does a better explanation than official 
>> > > page http://wiki.apache.org/solr/SolrPlugins that is really old.
>> > >
>> > >
>> > >
>> > > Thanks in advance.
>> > >
>> > >
>> > >
>> > > --
>> > > Gian Maria Ricci
>> > > Cell: +39 320 0136949
>> > >
>> > > [image:
>> > >
>> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZ
>> kVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0
>> -d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png
>> ]
>> > > 
>> [image:
>> > >
>> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xr
>> mGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcB
>> clKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jp
>> g
>> ]
>> > >  [image:
>> > >
>> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gp
>> cIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNS
>> VT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
>> ]
>> > >  [image:
>> > >
>> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJ
>> X96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0
>> -d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg
>> ]
>> > >  [image:
>> > >
>> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKf
>> n3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg
>> =s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
>> ]
>> > >
>> > >
>> > >
>> > --
>> > Regards,
>> > Binoy Dalal
>>


Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-04 Thread Binoy Dalal
I used those links to learn to write my first plugin as well.
I might have that code still lying around somewhere. Let me take a look and
get back.

On Thu, 4 Feb 2016, 19:32 Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> I've already found these two presentation, sadly enough link for source
> code is broken, it seems that the domain www.searchbox.com is completely
> down :|
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -Original Message-
> From: Binoy Dalal [mailto:binoydala...@gmail.com]
> Sent: mercoledì 3 febbraio 2016 17:46
> To: solr-user@lucene.apache.org
> Subject: Re: Tutorial or Code Samples to explain how to Write Solr Plugins
>
> Here's a couple of links you can follow to get started:
>
> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
>
> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
> These are to write a search component and a request handler respectively.
> They are on older solr versions but they should work with 5.x as well.
> I used these to get started when I was trying to write my first plugin.
> Once you get a hang of how it's to be done it's really not that difficult.
>
> On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
> alkamp...@nablasoft.com> wrote:
>
> > Hi,
> >
> >
> >
> > I wonder if there is some code samples or tutorial (updated to work
> > with version 5) to help users writing plugins.
> >
> >
> >
> > I’ve found lots of difficulties on the past to find such kind of
> > information when I needed to write some plugins, and I wonder if I
> > missed some site or link that does a better explanation than official
> > page http://wiki.apache.org/solr/SolrPlugins that is really old.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> > [image:
> > https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
> > VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d
> > -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> > 
> [image:
> > https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrm
> > GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBcl
> > KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
> >  [image:
> > https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gpc
> > IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT
> > =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> >  [image:
> > https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX
> > 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d
> > -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> >  [image:
> > https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn
> > 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s
> > 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
> >
> >
> >
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: Errors During Load Test

2016-02-04 Thread Binoy Dalal
What is your solr setup -- nodes/shards/specs?
7221 requests/min is a lot so it's likely that your solr setup simply isn't
able to support this kind of load which results in the requests timing out
which is why you keep seeing the timeout and connect exceptions.

On Thu, 4 Feb 2016, 20:30 Tiwari, Shailendra <
shailendra.tiw...@macmillan.com> wrote:

> Hi All,
>
> We did our first load test on Search (Solr) API, and started to see some
> errors after 2000 Users. Errors used to go away after 30 seconds, but keep
> happening frequently. Errors were "java.net.SocketTimeoutException" and
> "org.apache.http.conn.HttpHostConnectException". We were using JMeter to
> run the load test, and total of 15 different Search terms were used to
> execute API. Total Request/Min was 7221/min.
> We are using Apache/RedHat.
> We want to scale upto 4000 users. What's recommendation to reach there?
>
> Thanks
>
> Shail
>
-- 
Regards,
Binoy Dalal


Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-04 Thread Shahzad Masud
I have been running Solr 4.10 with Tomcat 7 with manual shard scheme (i.e.
4 Tomcats with 16 shards - Each tomcat having 4 contexts / instances in it
to represent shards). It was working fairly good for last 4 years, but with
few OOM (Out of memory) on random servers. This situation get back normal,
if I schedule a random server restart. I have been using distributed search
feature; which disallow me to use Solr Cloud.

Each shard is pointing to separate folders. In migration to Jetty; it
appears as a surprise that it support single shard per Jetty server. In
order to test this one further, I created 16 servers for all and bench
mark. Jetty / Solr performance is slow than Tomcat / Solr performance,
which is a surprise. tomcat is 25% faster than jetty. I am sure; I am
missing something otherwise Solr team might not be recommending Jetty for
5.X versions.

Q: Is this a normal that one node support one shard in Jetty?
Q: Can anyone point to appropriate guideline; if jetty is better than
tomcat?
Q: Have anyone else experienced similar migration, and concluded that
tomcat is better.

Thank you,
Have a great day !

Shahzad


Re: implement exact match for one of the search fields only?

2016-02-04 Thread Jack Krupansky
The desired architecture is that you use a middle app layer that clients
send queries to and that middle app layer then constructs the formal query
and sends it on to Solr proper. This architecture also enables breaking a
user query into multiple Solr queries and then aggregating the results.
Besides, the general goal is to avoid app clients talking directly to Solr
anyway.

-- Jack Krupansky

On Thu, Feb 4, 2016 at 2:57 AM, Derek Poh  wrote:

> Hi Erick
>
> <<
> The manual way of doing this would be to construct an elaborate query,
> like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket)
> OR NOTE: the parens are necessary or the last part of the above would
> be parsed as P_ShortDescription:dvd default_searchfield:bracket
> >>
>
> Your suggestion to construct the query like q=spp_keyword_exact:"dvd
> bracket" OR P_ShortDescription:(dvd bracket) OR does not fit into our
> current implementation.
> The front-end pages will only pass the "q=search keywords" in the query to
> solr. The list of search fields (qf) is pre-defined in solr.
>
> Do you have any alternatives to implement your suggestion without making
> changes to the front-end?
>
> On 1/29/2016 1:49 AM, Erick Erickson wrote:
>
>> bq: if you are interested phrase query, you should use String field
>>
>> If you do this, you will NOT be able to search within the string. I.e.
>> if the doc field is "my dog has fleas" you cannot match
>> "dog has" with a string-based field.
>>
>> If you want to match the _entire_ string or you want prefix-only
>> matching, then string might work, i.e. if you _only_ want to be able
>> to match
>>
>> "my dog has fleas"
>> "my dog*"
>> but not
>> "dog has fleas".
>>
>> On to the root question though.
>>
>> I really think you want to look at edismax. What you're trying to do
>> is apply the same search term to individual fields. In particular,
>> the pf parameter will automatically apply the search terms _as a phrase_
>> against the field specified, relieving you of having to enclose things
>> in quotes.
>>
>> The manual way of doing this would be to construct an elaborate query,
>> like
>> q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket)
>> OR
>>
>> NOTE: the parens are necessary or the last part of the above would be
>> parsed as
>> P_ShortDescription:dvd default_searchfield:bracket
>>
>> And the =query trick will show you exactly how things are actually
>> searched, it's invaluable.
>>
>> Best,
>> Erick
>>
>> On Thu, Jan 28, 2016 at 5:08 AM, Mugeesh Husain 
>> wrote:
>>
>>> Hi,
>>> if you are interested phrase query, you should use String field instead
>>> of
>>> text field in schema like as
>>>   
>>>
>>> this will solved you problem.
>>>
>>> if you are missing anything else let share
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/implement-exact-match-for-one-of-the-search-fields-only-tp4253786p4253827.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>
>


Re: Out of memory error during full import

2016-02-04 Thread Shawn Heisey
On 2/4/2016 12:18 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. When i try to do full import, i'm getting 
> OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
> maximum extent possible. Is there a workaround to do initial data load 
> without running into this error?
>
> I found that 'batchSize=-1' parameter needs to be specified in the datasource 
> for MySql, is there a way to specify for others Databases as well?

Setting batchSize to -1 in the DIH config translates to a 'setFetchSize'
on the JDBC object of Integer.MIN_VALUE.  This is how to turn on result
streaming in MySQL.

The method for doing this with other JDBC implementations is likely to
be different.  The Microsoft driver for SQL Server uses a URL parameter,
and newer versions of that particular driver have the streaming behavior
as default.  I have no idea how to do it for any other driver, you would
need to ask the author of the driver.

When you turn on caching (SortedMapBackedCache), you are asking Solr to
put all of the data received into memory -- very similar to what happens
if result streaming is not turned on.  When the SQL result is very
large, this can require a LOT of memory.  In situations like that,
you'll just have to remove the caching.  One alternative to child
entities is to do a query using JOIN in a single entity, so that all the
data you need is returned by a single SQL query, where the heavy lifting
is done by the database server instead of Solr.

The MySQL database that serves as the information source for *my* Solr
index is hundreds of gigabytes in size, so caching it is not possible
for me.  The batchSize=-1 option is the only way to get the import to work.

Thanks,
Shawn



Errors During Load Test

2016-02-04 Thread Tiwari, Shailendra
Hi All,

We did our first load test on Search (Solr) API, and started to see some errors 
after 2000 Users. Errors used to go away after 30 seconds, but keep happening 
frequently. Errors were "java.net.SocketTimeoutException" and 
"org.apache.http.conn.HttpHostConnectException". We were using JMeter to run 
the load test, and total of 15 different Search terms were used to execute API. 
Total Request/Min was 7221/min.
We are using Apache/RedHat.
We want to scale upto 4000 users. What's recommendation to reach there? 

Thanks

Shail


Re: Solr segment merging in different replica

2016-02-04 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thanks for your reply.

Yes, we were planning for such instance where the replica went down during
indexing, and when it re-started, it will start to copy the index over to
the main node.


Regards,
Edwin


On 5 February 2016 at 03:35, Shawn Heisey  wrote:

> On 2/4/2016 9:27 AM, Zheng Lin Edwin Yeo wrote:
> > Yes, I'm already on SolrCloud, so I'll probably stick to that.
> >
> > Regarding the network, I am just afraid that when the replica code copies
> > the index over from the main node, it will use up all the available
> > bandwidth, and causes the search query to have little bandwidth left,
> which
> > will affect the performance of the search from the front-end.
>
> Replicating the index in SolrCloud should be a VERY rare event, only
> happening when there's a serious problem such as a server going down and
> coming back up later, or after certain maintenance events.
>
> Merges do not involve network traffic.  In SolrCloud, each replica will
> handle merging locally.  It does not happen over the network.
>
> Even if a replication DOES happen, TCP makes room on the network for new
> connections like queries.  It's inherent in the design of the protocol.
> This is particularly effective on LAN connectivity.  If there's a WAN
> involved, then you might be right to worry about bandwidth.
>
> Regarding something you asked earlier in the thread: Assuming LAN
> connectivity, I think the only thing you will achieve by using separate
> network interfaces is configuration complexity.
>
> It might be possible to separate the interfaces, even though I think
> it's not required.  If you populate the hosts file on each server, or
> use split DNS, you could have clients use a different address than the
> Solr servers themselves use for inter-node communication, but in general
> there is no need for this, because high network bandwidth utilization is
> only likely during a replication event, or during bulk indexing to
> rebuild collections.  For bulk indexing, the CPU and disk I/O impact
> will almost certainly cause more of a slowdown than the network, unless
> you're using a low-speed WAN, which is not recommended.
>
> Thanks,
> Shawn
>
>


Re: Multi-level nested documents query

2016-02-04 Thread Pranaya Behera

Hi Mikhail,
 Thank you for the link. I will check that blog post.

On Friday 05 February 2016 01:42 AM, Mikhail Khludnev wrote:

Hello,

I'm not sure that it's achievable overall, but at least you need to  use
different parent fields/terms/filters across levels like in
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html


On Thu, Feb 4, 2016 at 8:39 PM, Pranaya Behera 
wrote:


Hi,
  I have documents that are indexed are like this:

product
-isParent:true
- child1
-isParent:true
- child1_1
- child1_2
- child1_3
- child2
-isParent:true
- child2_1
- child2_2
- child2_3

I have used fl=*,[child parentFilter=isParent:true] it doesnt give back
the children.
This expand=true=*:*=_root_=100 seems to
work give me all the children but not in a single format like the above
query does when I have only one level of nested documents but for
multilevel it just gives me only the parent not the children.

--
Thanks & Regards
Pranaya Behera






--
Thanks & Regards
Pranaya Behera



Re: ​Securing fields and documents with Shield | Elastic

2016-02-04 Thread Philip Durbin
Thanks for replying, Alex. At the moment, my requirement is to show
public/published documents as well as unpublished documents based on the
user issuing the query. Or just the user's documents, with no public
documents. I've implemented this with a JOIN and my last post on this is
here:
http://lucene.472066.n3.nabble.com/Solr-JOIN-keeping-permission-data-out-of-primary-documents-td4169739.html

I haven't played with Shield either. Maybe ManifoldCF is equivalent? I took
a quick look at ManifoldCF a couple years ago and it seemed complicated.
The JOIN approach seemed easier to implement, but I don't love having to
keep what I'll call "primary" documents (useful content) in sync with what
I'll call "permission" documents (to JOIN against). Please see the post
above for what I'm up to. It works but I don't love it.

The marketing material for Shield seems nice. The docs seem nice. As a
solution for document level security, it seems to be more "on rails" or
"ready out of the box" than anything I've seen out of the Solr world. (I
understand that experts can roll their own Solr solution.) The post I
originally linked simply reminded me of Shield's existence and got me
wondering if the Solr team is working on something like Shield. Shield
seems to take the mystery out of document level security. It seems well
documented and straighforward. But again, I haven't actually played with it
yet. Besides, it requires a license.

Phil

On Thu, Feb 4, 2016 at 8:51 PM, Alexandre Rafalovitch 
wrote:

> I have not used Shield yet, so this is based just on the document you sent.
>
> I would use different Request Handler endpoints for different users
> and put the restrictions there, in the invariants section.
>
> For field restrictions, I would use 'uf' parameter. As for example
> here (from my old book):
>
> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml#L24
>
> For document restrictions, it just seems like an extra 'fq' query to
> filter out the documents. Or a post-filter.
>
> The only question is how to route to the relevant endpoint and that
> can be done in the middle-ware or possibly by one of the plugin
> components for Solr.  Or use ManifoldCF, as per the wiki page on the
> topic: https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
>
> Does this fulfill your requirements?
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 5 February 2016 at 12:08, Philip Durbin 
> wrote:
> > Does Solr have anything like this?
> >
> > https://www.elastic.co/blog/securing-fields-and-documents-with-shield
> >
> > Or is it on the roadmap?
>



-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin


Solr 5: not loading shards from symlinked directories

2016-02-04 Thread Norgorn
I've tried to upgrade from Solr 4.10.3 to 5.4.1. Solr shards are placed on
different disks and symlinks (ln -s) are created to SOLR_HOME (SOLR_HOME
itself is set as an absolute path and works fine).
When Solr starts, it loads only shards placed in home directory, but not
symlinked ones.
If I copy shard to home directory (in file system path remains unchanged,
like SOLR_HOME/my_shard1, both symlinked and copied), it works.

Are there any ways to overcome this issue?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-not-loading-shards-from-symlinked-directories-tp4255403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr segment merging in different replica

2016-02-04 Thread Zheng Lin Edwin Yeo
Thanks Emir and Benedetti.

Yes, I'm already on SolrCloud, so I'll probably stick to that.

Regarding the network, I am just afraid that when the replica code copies
the index over from the main node, it will use up all the available
bandwidth, and causes the search query to have little bandwidth left, which
will affect the performance of the search from the front-end.

Regards,
Edwin


On 4 February 2016 at 01:06, Alessandro Benedetti 
wrote:

> Master/Slave is the old legacy way to obtain a resilient system.
> It's easier to setup, but if you are already on SolrCloud I can not see any
> advantage in moving back.
> Related the networking part, I am not a network expert.
> The only think I can tell you is that the inter-nodes communication is
> going to happen on the same REST endpoints and handlers which are used for
> search/updates ( same java process).
> I really doubt it is possible to have them running across different
> physical network interfaces.
>
> Cheers
>
>
> On 3 February 2016 at 10:41, Emir Arnautovic  >
> wrote:
>
> > Hi Edwin,
> > Master-Slave's main (maybe only) advantage is simpler infrastructure - it
> > does not use ZK. Also, it does assume you don't need NRT search since
> there
> > has to be longer periods between replicating master changes to slaves.
> >
> > Regards,
> > Emir
> >
> >
> > On 03.02.2016 04:48, Zheng Lin Edwin Yeo wrote:
> >
> >> Hi Emir,
> >>
> >> Thanks for your reply.
> >>
> >> As currently both of my main and replica are in the same server, and as
> I
> >> am using the SolrCloud setup, both the replica are doing the merging
> >> concurrently, which causes the memory usage of the server to be very
> high,
> >> and affect the other functions like querying. This issue should be
> >> eliminated when I shift my replica to another server.
> >>
> >> Would like to check, will there be any advantage if I change to the
> >> Master-Slave setup, as compared to the SolrCloud setup which I am
> >> currently
> >> using?
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >>
> >> On 2 February 2016 at 21:23, Emir Arnautovic <
> >> emir.arnauto...@sematext.com>
> >> wrote:
> >>
> >> Hi Edwin,
> >>> Do you see any signs of network being bottleneck that would justify
> such
> >>> setup? I would suggest you monitor your cluster before deciding if you
> >>> need
> >>> separate interfaces for external and internal communication. Sematext's
> >>> SPM
> >>> (http://sematext.com/spm) allows you to monitor SolrCloud, hosts and
> >>> network and identify bottlenecks in your cluster.
> >>>
> >>> Regards,
> >>> Emir
> >>>
> >>> --
> >>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >>> Solr & Elasticsearch Support * http://sematext.com/
> >>>
> >>>
> >>>
> >>> On 02.02.2016 00:50, Zheng Lin Edwin Yeo wrote:
> >>>
> >>> Hi Emir,
> 
>  My setup is SolrCloud.
> 
>  Also, will it be good to use a separate network interface to connect
> the
>  two node with the interface that is used to connect to the network for
>  searching?
> 
>  Regards,
>  Edwin
> 
> 
>  On 1 February 2016 at 19:01, Emir Arnautovic <
>  emir.arnauto...@sematext.com>
>  wrote:
> 
>  Hi Edwin,
> 
> > What is your setup - SolrCloud or Master-Slave? If it si SolrCloud,
> > then
> > under normal index updates, each core is behaving as independent
> index.
> > In
> > theory, if all changes happen at the same time on all nodes, merges
> > will
> > happen at the same time. But that is not realistic and it is expected
> > to
> > happen in slightly different time.
> > If you are running Master-Slave, then new segments will be copied
> from
> > master to slave.
> >
> > Regards,
> > Emir
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> > On 01.02.2016 11:56, Zheng Lin Edwin Yeo wrote:
> >
> > Hi,
> >
> >> I would like to check, during segment merging, how did the replical
> >> node
> >> do
> >> the merging?
> >> Will it do the merging concurrently, or will the replica node delete
> >> the
> >> old segment and replace the new one?
> >>
> >> Also, is it possible to separate the network interface for
> inter-node
> >> communication from the network interface for update/search requests?
> >> If so I could put two network cards in each machine and route the
> >> index
> >> and
> >> search traffic over the first interface and the traffic for the
> >> inter-node
> >> communication (sending documents to replicas) over the second
> >> interface.
> >>
> >> I'm using Solr 5.4.0
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >>
> >>
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log 

Out of memory error during full import

2016-02-04 Thread Srinivas Kashyap
Hello,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. When i try to do full import, i'm getting 
OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
maximum extent possible. Is there a workaround to do initial data load without 
running into this error?

I found that 'batchSize=-1' parameter needs to be specified in the datasource 
for MySql, is there a way to specify for others Databases as well?

Thanks and Regards,
Srinivas Kashyap
DISCLAIMER: E-mails and attachments from Bamboo Rose, Inc. are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Re: filters to work with dates

2016-02-04 Thread Miguel Valencia Zurera

Hi Markus

At first, I thought keep the original field and create a new field using 
function "Copying Fields 
". For 
this reason, I thought it was better choice to use a filter function in 
destiny field. However I am going to studied your suggestion.


A lot of thanks.

El 02/02/2016 a las 14:56, Markus Jelsma escribió:

Hello - i would opt for having a date field, and a custom update processor that 
converts a string date via DateUtils.parseDate() to an actual Date object. I 
think this would be a much simpler approach than a custom field or token filter.

Markus
  
-Original message-

From:Miguel Valencia Zurera 
Sent: Tuesday 2nd February 2016 13:09
To: solr-user@lucene.apache.org
Subject: filters to work with dates

Hi everybody

I'm looking for a filter o similar function to resolve the next problem
in my solr index:
I have a string field that it contains a date but each record of this
field can be in diferent formats. Now I have to sort by this field and
for this I have to normalize this field. I've thought create a new kind
of field and to use a index filter to transform to date UTC, but I do
not find a similar filter to do it.

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions

I think that I have to developed a custom index filter to take string,
look for a possible date pattern, and transform to date.
What do you think about this?
Somebody can confirm me, that this way is the best way? or is there any
options?

thanks





Re: Solr segment merging in different replica

2016-02-04 Thread Shawn Heisey
On 2/4/2016 9:27 AM, Zheng Lin Edwin Yeo wrote:
> Yes, I'm already on SolrCloud, so I'll probably stick to that.
>
> Regarding the network, I am just afraid that when the replica code copies
> the index over from the main node, it will use up all the available
> bandwidth, and causes the search query to have little bandwidth left, which
> will affect the performance of the search from the front-end.

Replicating the index in SolrCloud should be a VERY rare event, only
happening when there's a serious problem such as a server going down and
coming back up later, or after certain maintenance events.

Merges do not involve network traffic.  In SolrCloud, each replica will
handle merging locally.  It does not happen over the network.

Even if a replication DOES happen, TCP makes room on the network for new
connections like queries.  It's inherent in the design of the protocol. 
This is particularly effective on LAN connectivity.  If there's a WAN
involved, then you might be right to worry about bandwidth.

Regarding something you asked earlier in the thread: Assuming LAN
connectivity, I think the only thing you will achieve by using separate
network interfaces is configuration complexity.

It might be possible to separate the interfaces, even though I think
it's not required.  If you populate the hosts file on each server, or
use split DNS, you could have clients use a different address than the
Solr servers themselves use for inter-node communication, but in general
there is no need for this, because high network bandwidth utilization is
only likely during a replication event, or during bulk indexing to
rebuild collections.  For bulk indexing, the CPU and disk I/O impact
will almost certainly cause more of a slowdown than the network, unless
you're using a low-speed WAN, which is not recommended.

Thanks,
Shawn



Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-04 Thread Shahzad Masud
Thank you Shawn for your response. I have been using manual shards (old
mechanism) i.e. seperate context for each shard and each shard pointing to
seperate data and indexing folder.

Shard 1 = localhost:8983/solr_2014
Shard 2 = localhost:8983/solr_2015
Shard 3 = localhost:8983/solr_2016

Do you think this is a good design practise? Can you share an example which
may help me deploy two shards in one jetty?

Shahzad

On Thursday, 4 February 2016, Shawn Heisey  wrote:

> On 2/4/2016 7:29 AM, Shahzad Masud wrote:
> > Q: Is this a normal that one node support one shard in Jetty?
> > Q: Can anyone point to appropriate guideline; if jetty is better than
> > tomcat?
> > Q: Have anyone else experienced similar migration, and concluded that
> > tomcat is better.
>
> Solr 5.x is more difficult to put into Tomcat than 4.x was -- there is
> no .war file in the download at all as of version 5.3.  It can still be
> done, but we strongly recommend using Solr as it is shipped, with Jetty.
>
> https://wiki.apache.org/solr/WhyNoWar
>
> The recommendation is to use the Jetty that comes with Solr, not a
> separate Jetty package.  I would not be too surprised to learn that
> Tomcat is better than a separate Jetty package, but in that case, both
> of them have no tuning.  The jetty that comes with Solr is tuned for
> Solr.  The most important part of that tuning is the maxThreads setting
> -- the default value of 200 in Tomcat and Jetty is easy to exceed ...
> and when the container starts limiting the number of threads,
> performance *will* suffer.
>
> No matter where the Jetty comes from, there is *NOT* a limitation of one
> shard per node with Jetty.  Where did you hear that?  Whatever resource
> you are looking at which states this is wrong, and I'd like to get it
> corrected.  I personally am running Solr installs (both 4.x and 5.x) on
> Jetty which have dozens of cores (shards).
>
> FYI -- SolrCloud fully supports sharded indexes.  Sharding is often the
> entire point of using SolrCloud.  Sharded indexes are easier to manage
> in SolrCloud than they are in standalone mode -- shard handling for both
> indexing and queries is fully automated.
>
> Thanks,
> Shawn
>
>


Re: Solr for real time analytics system

2016-02-04 Thread Susheel Kumar
Hi Rohit,

Please take a loot at Streaming expressions & Parallel SQL Interface.  That
should meet many of your analytics requirement (aggregation queries like
sum/average/groupby etc).
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface

Thanks,
Susheel

On Thu, Feb 4, 2016 at 3:17 AM, Arkadiusz Robiński <
arkadiusz.robin...@otodom.pl> wrote:

> A few people did a real time analytics system with solr and talked about it
> at conferences. Maybe you'll find their presentations useful:
>
> https://www.youtube.com/results?search_query=solr%20real%20time%20analytics=_l=
> (esp. the first one: https://www.youtube.com/watch?v=PkoyCxBXAiA )
>
> On Thu, Feb 4, 2016 at 8:25 AM, Rohit Kumar  >
> wrote:
>
> > Thanks Bhimavarapu for the information.
> >
> > We are creating our own dashboard, so probably wont need kibana/banana. I
> > was more curious about Solr support for fast aggregation query over very
> > large data set. As suggested, I guess elasticsearch  has this capability.
> > Is there any published metrics or data regarding elasticsearch/solr
> > performance in this area that I can refer to?
> >
> > Thanks
> > Rohit
> >
> >
> >
> > On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu <
> chaitu...@gmail.com>
> > wrote:
> >
> > > Hello Rohit,
> > >
> > > You can use the Banana project which was forked from Kibana
> > > , and works with all kinds of time
> > > series (and non-time series) data stored in Apache Solr
> > > . It uses Kibana's powerful dashboard
> > > configuration capabilities, ports key panels to work with Solr, and
> > > provides significant additional capabilities, including new panels that
> > > leverage D3.js 
> > >
> > >  would need mostly aggregation queries like sum/average/groupby etc,
> but
> > > > data set is quite huge. The aggregation queries should be very fast.
> > >
> > >
> > > all your requirement can be served by this banana but I'm not sure
> about
> > > how fast solr compare to ELK 
> > >
> > > On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> > > rohitkumarbhagat...@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > I am quite new to Solr. I have to build a real time analytics system
> > > which
> > > > displays metrics based on multiple filters over a huge data set
> > > (~50million
> > > > documents with ~100 fileds ).  I would need mostly aggregation
> queries
> > > like
> > > > sum/average/groupby etc, but data set is quite huge. The aggregation
> > > > queries should be very fast.
> > > >
> > > > Is Solr suitable for such use cases?
> > > >
> > > > Thanks
> > > > Rohit
> > > >
> > >
> > >
> > >
> > > --
> > > ckreddybh. 
> > >
> >
>
>
>
> --
> Arkadiusz Robiński
> Software Developer
> Otodom.pl
>


commitReserveDuration vs maxWriteMBPerSec

2016-02-04 Thread Zheng Lin Edwin Yeo
Hi,

I would like to find out, what is the difference
between commitReserveDuration and maxWriteMBPerSec under the /replication
requestHandler?

Will there be any impact if we set a long commitReserveDuration but a high
maxWriteMBPerSec?

I am using Solr 5.4.0

Regards,
Edwin


Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-04 Thread Shawn Heisey
On 2/4/2016 9:48 AM, Shahzad Masud wrote:
> Thank you Shawn for your response. I have been using manual shards (old
> mechanism) i.e. seperate context for each shard and each shard pointing to
> seperate data and indexing folder.
> 
> Shard 1 = localhost:8983/solr_2014
> Shard 2 = localhost:8983/solr_2015
> Shard 3 = localhost:8983/solr_2016
> 
> Do you think this is a good design practise? Can you share an example which
> may help me deploy two shards in one jetty?

Manual sharding typically does *not* involve multiple contexts (webapps)
in your container.

One instance of Solr (using, for example, the /solr context) can handle
many cores.

https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml

This functionality is available in *any* container that Solr supports,
including both Tomcat and Jetty.

Thanks,
Shawn



Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-04 Thread Mark Miller
Only INFO level, so I suspect not bad...

If that Overseer closed, another node should have picked up where it left
off. See that in another log?

Generally an Overseer close means a node or cluster restart.

This can cause a lot of DOWN state publishing. If it's a cluster restart, a
lot of those DOWN publishes are not processed until the cluster is started
back up - which can lead to the Overseer being overwhelmed and things not
responding fast enough. You should be able to see an active Overseer
working on publishing those states though (it shows that at INFO logging
level).

If the Overseer is simply down and another did not take over, that is just
some kind of bug. If it's overwhelmed, 5x is much much faster,
and SOLR-7281 should also help, but that is no real help for 4.x at this
point.

Anyway, key is, what is the active Overseer doing. Is there no active
Overseer? Or is it busy trying to push through a backlog of operations.

- Mark

On Wed, Feb 3, 2016 at 8:46 PM hawk  wrote:

> Thanks Mark.
>
> I was able to search "Overseer" in the solr logs around the time frame of
> the condition. This particular message was from the leader node of the
> shard.
>
> 160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing
>
> Also I found this message in the zookeeper logs.
>
> 11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
> processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3
> zxid:0xf0001be48
> txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
> NodeExists for /overseer
>
> Any thoughts what these messages suggest?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Kevin Colgan
Hi everyone,

Is it possible to use SqlEntityProcessor in cached mode to repeat a query for a 
nested child element? I'd like to use the entity query once to consolidate 
information from the children to the parent, then another to actually index the 
entities as children. 

Here's an example of what I'm trying to do in the db-config file. The 
EventsTransformer consolidates information from child events and adds fields to 
the parent row. I had to add the two entities as the EventsTransformer will 
only add fields to the parent if child=false:

This is NOT working - the child event entities aren't being created 

    
    


This is IS working but the events query is being run twice so indexing is twice 
as slow


    
    


Anyone got any idea how to do this? I've already tried nesting the second child 
entity inside the other but this didn't work.
Thanks,Kevin


Multi-level nested documents query

2016-02-04 Thread Pranaya Behera

Hi,
 I have documents that are indexed are like this:

product
-isParent:true
   - child1
   -isParent:true
   - child1_1
   - child1_2
   - child1_3
   - child2
   -isParent:true
   - child2_1
   - child2_2
   - child2_3

I have used fl=*,[child parentFilter=isParent:true] it doesnt give back 
the children.
This expand=true=*:*=_root_=100 seems 
to work give me all the children but not in a single format like the 
above query does when I have only one level of nested documents but for 
multilevel it just gives me only the parent not the children.


--
Thanks & Regards
Pranaya Behera



Re: Errors During Load Test

2016-02-04 Thread Erick Erickson
The short form is "add more replicas", assuming you're using SolrCloud.

If older-style master/slave, then "add more slaves". Solr request processing
scales pretty linearly with the number of replicas (or slaves).

Note that this is _not_ adding shards (assuming SolrCloud). You usually add
shards when your response time under light load is unacceptable indicating
that you need fewer documents in each shard.

Biony's question needs to be answered before any but the most general
advice is possible, what is your setup? What
version of Solr? How many docs? How many shards? etc.

Best,
Erick

On Thu, Feb 4, 2016 at 7:06 AM, Binoy Dalal  wrote:
> What is your solr setup -- nodes/shards/specs?
> 7221 requests/min is a lot so it's likely that your solr setup simply isn't
> able to support this kind of load which results in the requests timing out
> which is why you keep seeing the timeout and connect exceptions.
>
> On Thu, 4 Feb 2016, 20:30 Tiwari, Shailendra <
> shailendra.tiw...@macmillan.com> wrote:
>
>> Hi All,
>>
>> We did our first load test on Search (Solr) API, and started to see some
>> errors after 2000 Users. Errors used to go away after 30 seconds, but keep
>> happening frequently. Errors were "java.net.SocketTimeoutException" and
>> "org.apache.http.conn.HttpHostConnectException". We were using JMeter to
>> run the load test, and total of 15 different Search terms were used to
>> execute API. Total Request/Min was 7221/min.
>> We are using Apache/RedHat.
>> We want to scale upto 4000 users. What's recommendation to reach there?
>>
>> Thanks
>>
>> Shail
>>
> --
> Regards,
> Binoy Dalal


Re: Hard commits, soft commits and transaction logs

2016-02-04 Thread Erick Erickson
bq: and suggestions of deleted docs are...

OK, this is something different than I read the first time. I'm
assuming that when you mention suggestions, you're using
one of the suggesters that works off the indexed terms, which
will include data from deleted docs. There's really not a good
mechanism other than getting all the data associated with
deleted documents out of there that I know of in that scenario. What
people have done:

1> Just lived with it. On a reasonably large corpus, the number of suggestions
that aren't actually in a live document is often very small, small enough to
ignore. In this case you might be seeing something because of your tests that
makes no practical difference.

I'll add parenthetically that users will get empty results even if all the terms
suggested are in "live" docs assuming they, say, add filter queries. Imagine
a filter query restricting the returns to docs dated yesterday and suggestions
come back from docs dated 5 days ago.

2> Curate the suggestions. In this scenario there's a fixed list of terms in a
text file that you suggest from.

3> Optimize the index. This is usually only really acceptable for setups where
the index changes infrequently (e.g nightly or something) which doesn't
sound like it fits your scenario at all.

bq: Suggestions are re-built on commit

I'm going to go out on a limb and say that this is likely to not work at all
for production in a NRT setup. This will take far too much time on a
significantly-sized
corpus to be feasible. At least that's my fear, I'm mostly advising you to
check this before even trying to scale up.

Best,
Erick

On Wed, Feb 3, 2016 at 11:07 PM, Clemens Wyss DEV  wrote:
> Sorry for coming back to this topic:
> You (Erick) mention "By and large, do not issue commits from any client 
> indexing to Solr"
>
> In ordert o achieve NRT, I for example test
>  
>18 
>true
>  
>   
> 1 
>   
>
> For (unit)testing purposes
>  
>1000 
>true
>  
>   
> 500 
>   
>
> Suggestions are re-built on commit
> ...
> true
> ...
>
> (Almost) all my unit tests pass. Except for my docDeletion-test: it looks 
> like expungeDeletes is never "issued" and suggestions of deleted docs are 
> returned.
> When I explicitly issue an "expunging-soft-commit"
>
> UpdateRequest rq = new UpdateRequest();
> rq.setAction( UpdateRequest.ACTION.COMMIT, false, false, 100, true, true );
> rq.process( solrClient );
>
> the test passes and no false suggestions are returned. What am I facing?
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Montag, 4. Januar 2016 17:36
> An: solr-user
> Betreff: Re: Hard commits, soft commits and transaction logs
>
> As far as I know. If you see anything different, let me know and we'll see if 
> we can update it.
>
> Best,
> Erick
>
> On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV  wrote:
>> [Happy New Year to all]
>>
>> Is all herein
>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-
>> softcommit-and-commit-in-sorlcloud/
>> mentioned/recommended still valid for Solr 5.x?
>>
>> - Clemens


RE: Errors During Load Test

2016-02-04 Thread Tiwari, Shailendra
We are on Solr 4.10.3. Got 2 load balanced RedHat with 16 GB memory on each. 
Memory assigned to JVM 4 GB, 2 Shards, total docs 60 K, and 4 replicas.

Thanks 

Shail

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, February 04, 2016 1:27 PM
To: solr-user
Subject: Re: Errors During Load Test

The short form is "add more replicas", assuming you're using SolrCloud.

If older-style master/slave, then "add more slaves". Solr request processing 
scales pretty linearly with the number of replicas (or slaves).

Note that this is _not_ adding shards (assuming SolrCloud). You usually add 
shards when your response time under light load is unacceptable indicating that 
you need fewer documents in each shard.

Biony's question needs to be answered before any but the most general advice is 
possible, what is your setup? What version of Solr? How many docs? How many 
shards? etc.

Best,
Erick

On Thu, Feb 4, 2016 at 7:06 AM, Binoy Dalal  wrote:
> What is your solr setup -- nodes/shards/specs?
> 7221 requests/min is a lot so it's likely that your solr setup simply 
> isn't able to support this kind of load which results in the requests 
> timing out which is why you keep seeing the timeout and connect exceptions.
>
> On Thu, 4 Feb 2016, 20:30 Tiwari, Shailendra < 
> shailendra.tiw...@macmillan.com> wrote:
>
>> Hi All,
>>
>> We did our first load test on Search (Solr) API, and started to see 
>> some errors after 2000 Users. Errors used to go away after 30 
>> seconds, but keep happening frequently. Errors were 
>> "java.net.SocketTimeoutException" and 
>> "org.apache.http.conn.HttpHostConnectException". We were using JMeter 
>> to run the load test, and total of 15 different Search terms were used to 
>> execute API. Total Request/Min was 7221/min.
>> We are using Apache/RedHat.
>> We want to scale upto 4000 users. What's recommendation to reach there?
>>
>> Thanks
>>
>> Shail
>>
> --
> Regards,
> Binoy Dalal


Re: Multi-level nested documents query

2016-02-04 Thread Mikhail Khludnev
Hello,

I'm not sure that it's achievable overall, but at least you need to  use
different parent fields/terms/filters across levels like in
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html


On Thu, Feb 4, 2016 at 8:39 PM, Pranaya Behera 
wrote:

> Hi,
>  I have documents that are indexed are like this:
>
> product
> -isParent:true
>- child1
>-isParent:true
>- child1_1
>- child1_2
>- child1_3
>- child2
>-isParent:true
>- child2_1
>- child2_2
>- child2_3
>
> I have used fl=*,[child parentFilter=isParent:true] it doesnt give back
> the children.
> This expand=true=*:*=_root_=100 seems to
> work give me all the children but not in a single format like the above
> query does when I have only one level of nested documents but for
> multilevel it just gives me only the parent not the children.
>
> --
> Thanks & Regards
> Pranaya Behera
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Alexandre Rafalovitch
Where did cachePrimaryKey comes from? The documentation has cacheKey :
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 5 February 2016 at 02:53, Kevin Colgan  wrote:
> Hi everyone,
>
> Is it possible to use SqlEntityProcessor in cached mode to repeat a query for 
> a nested child element? I'd like to use the entity query once to consolidate 
> information from the children to the parent, then another to actually index 
> the entities as children.
>
> Here's an example of what I'm trying to do in the db-config file. The 
> EventsTransformer consolidates information from child events and adds fields 
> to the parent row. I had to add the two entities as the EventsTransformer 
> will only add fields to the parent if child=false:
>
> This is NOT working - the child event entities aren't being created
>  query="select  from houses">
>  transformer="EventsTransformer"
> name="events"
> query="select '${houses.uid}_events_' || e_id::text AS uuid, fields> from events">
>  child=true
> processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
> cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
> name="events"
> query="select  from events">
> 
>
> This is IS working but the events query is being run twice so indexing is 
> twice as slow
>
>  query="select  from houses">
>  transformer="EventsTransformer"
> name="events_parsed"
> query="select '${houses.uid}_events_' || e_id::text AS uuid, 
> e_id::text AS events_e_id, from events">
>  child=true
> processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
> cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
> transformer="EventsTransformer"
> name="events_child"
> query="select  from events">
> 
>
> Anyone got any idea how to do this? I've already tried nesting the second 
> child entity inside the other but this didn't work.
> Thanks,Kevin


How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
One of our shards went down.  We brought it back up but it doesn't have a
hash range:



active



marketmuse_shard1_replica1
http://172.30.0.254:8080/solr
172.30.0.254:8080_solr
active


active
marketmuse_shard1_replica2
172.30.0.89:8080_solr
http://172.30.0.89:8080/solr
true





This results in the error message:

org.apache.solr.common.SolrException: No active slice servicing hash code
a55b940e in DocCollection


I've been reading guides online and they suggest updating the Zookeeper
config.

Specifically, they suggest getting clusterstate.json.  But I've tried that
and when I get that file, I only get an empty file {}

Is there another way to ask Zookeeper to cover the missing hash range?

Thanks,
Aki


Re: Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Kevin Colgan
you're right, that was a mistake in my code - I did actually using cacheKey but 
that didn't work so I was looking at the Java class for DIHCacheSupport to see 
if there were any other settings I could use 
https://lucene.apache.org/solr/5_4_0/solr-dataimporthandler/index.html?org/apache/solr/handler/dataimport/DIHCacheSupport.html
 
There doesn't seem to be a lot of documentation or examples for using cacheKey 
and SQLEntityProcessor around.

Regards,Kevin 

On Thursday, February 4, 2016 9:31 PM, Alexandre Rafalovitch 
 wrote:
 
 

 Where did cachePrimaryKey comes from? The documentation has cacheKey :
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Regards,
    Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 5 February 2016 at 02:53, Kevin Colgan  wrote:
> Hi everyone,
>
> Is it possible to use SqlEntityProcessor in cached mode to repeat a query for 
> a nested child element? I'd like to use the entity query once to consolidate 
> information from the children to the parent, then another to actually index 
> the entities as children.
>
> Here's an example of what I'm trying to do in the db-config file. The 
> EventsTransformer consolidates information from child events and adds fields 
> to the parent row. I had to add the two entities as the EventsTransformer 
> will only add fields to the parent if child=false:
>
> This is NOT working - the child event entities aren't being created
>  query="select  from houses">
>            transformer="EventsTransformer"
>        name="events"
>        query="select '${houses.uid}_events_' || e_id::text AS uuid,fields> from events">
>            child=true
>        processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
>cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
>        name="events"
>        query="select  from events">
> 
>
> This is IS working but the events query is being run twice so indexing is 
> twice as slow
>
>  query="select  from houses">
>            transformer="EventsTransformer"
>        name="events_parsed"
>        query="select '${houses.uid}_events_' || e_id::text AS uuid, 
>e_id::text AS events_e_id, from events">
>            child=true
>        processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
>cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
>        transformer="EventsTransformer"
>        name="events_child"
>        query="select  from events">
> 
>
> Anyone got any idea how to do this? I've already tried nesting the second 
> child entity inside the other but this didn't work.
> Thanks,Kevin

 
  

Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
PS - confirmed:  in the GUI,  I go to Admin->Cloud->Tree, click on
clusterstate.json and it's empty {}



On Thu, Feb 4, 2016 at 3:37 PM, Aki Balogh  wrote:

> One of our shards went down.  We brought it back up but it doesn't have a
> hash range:
>
> 
> 
> active
> 
> 
> 
> marketmuse_shard1_replica1
> http://172.30.0.254:8080/solr
> 
> 
> 172.30.0.254:8080_solr
> active
> 
> 
> active
> marketmuse_shard1_replica2
> 172.30.0.89:8080_solr
> http://172.30.0.89:8080/solr
> 
> 
> true
> 
> 
> 
>
>
> This results in the error message:
>
> org.apache.solr.common.SolrException: No active slice servicing hash code
> a55b940e in DocCollection
>
>
> I've been reading guides online and they suggest updating the Zookeeper
> config.
>
> Specifically, they suggest getting clusterstate.json.  But I've tried that
> and when I get that file, I only get an empty file {}
>
> Is there another way to ask Zookeeper to cover the missing hash range?
>
> Thanks,
> Aki
>


Re: implement exact match for one of the search fields only?

2016-02-04 Thread Derek Poh

Hi Erick

<<
The manual way of doing this would be to construct an elaborate query, 
like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd 
bracket) OR NOTE: the parens are necessary or the last part of the 
above would be parsed as P_ShortDescription:dvd default_searchfield:bracket

>>

Your suggestion to construct the query like q=spp_keyword_exact:"dvd 
bracket" OR P_ShortDescription:(dvd bracket) OR does not fit into our 
current implementation.
The front-end pages will only pass the "q=search keywords" in the query 
to solr. The list of search fields (qf) is pre-defined in solr.


Do you have any alternatives to implement your suggestion without making 
changes to the front-end?


On 1/29/2016 1:49 AM, Erick Erickson wrote:

bq: if you are interested phrase query, you should use String field

If you do this, you will NOT be able to search within the string. I.e.
if the doc field is "my dog has fleas" you cannot match
"dog has" with a string-based field.

If you want to match the _entire_ string or you want prefix-only
matching, then string might work, i.e. if you _only_ want to be able
to match

"my dog has fleas"
"my dog*"
but not
"dog has fleas".

On to the root question though.

I really think you want to look at edismax. What you're trying to do
is apply the same search term to individual fields. In particular,
the pf parameter will automatically apply the search terms _as a phrase_
against the field specified, relieving you of having to enclose things
in quotes.

The manual way of doing this would be to construct an elaborate query, like
q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket) OR

NOTE: the parens are necessary or the last part of the above would be
parsed as
P_ShortDescription:dvd default_searchfield:bracket

And the =query trick will show you exactly how things are actually
searched, it's invaluable.

Best,
Erick

On Thu, Jan 28, 2016 at 5:08 AM, Mugeesh Husain  wrote:

Hi,
if you are interested phrase query, you should use String field instead of
text field in schema like as
  

this will solved you problem.

if you are missing anything else let share



--
View this message in context: 
http://lucene.472066.n3.nabble.com/implement-exact-match-for-one-of-the-search-fields-only-tp4253786p4253827.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Loading Solr Analyzer from RuntimeLib Blob

2016-02-04 Thread Ravikant
Hi,

Did you find a solution to your problem? We are also having similar problem. We 
tried with ‘sharedLib’ attribute but to no avail so far.

-Ravi

Re: Solr for real time analytics system

2016-02-04 Thread Arkadiusz Robiński
A few people did a real time analytics system with solr and talked about it
at conferences. Maybe you'll find their presentations useful:
https://www.youtube.com/results?search_query=solr%20real%20time%20analytics=_l=
(esp. the first one: https://www.youtube.com/watch?v=PkoyCxBXAiA )

On Thu, Feb 4, 2016 at 8:25 AM, Rohit Kumar 
wrote:

> Thanks Bhimavarapu for the information.
>
> We are creating our own dashboard, so probably wont need kibana/banana. I
> was more curious about Solr support for fast aggregation query over very
> large data set. As suggested, I guess elasticsearch  has this capability.
> Is there any published metrics or data regarding elasticsearch/solr
> performance in this area that I can refer to?
>
> Thanks
> Rohit
>
>
>
> On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu 
> wrote:
>
> > Hello Rohit,
> >
> > You can use the Banana project which was forked from Kibana
> > , and works with all kinds of time
> > series (and non-time series) data stored in Apache Solr
> > . It uses Kibana's powerful dashboard
> > configuration capabilities, ports key panels to work with Solr, and
> > provides significant additional capabilities, including new panels that
> > leverage D3.js 
> >
> >  would need mostly aggregation queries like sum/average/groupby etc, but
> > > data set is quite huge. The aggregation queries should be very fast.
> >
> >
> > all your requirement can be served by this banana but I'm not sure about
> > how fast solr compare to ELK 
> >
> > On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> > rohitkumarbhagat...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I am quite new to Solr. I have to build a real time analytics system
> > which
> > > displays metrics based on multiple filters over a huge data set
> > (~50million
> > > documents with ~100 fileds ).  I would need mostly aggregation queries
> > like
> > > sum/average/groupby etc, but data set is quite huge. The aggregation
> > > queries should be very fast.
> > >
> > > Is Solr suitable for such use cases?
> > >
> > > Thanks
> > > Rohit
> > >
> >
> >
> >
> > --
> > ckreddybh. 
> >
>



-- 
Arkadiusz Robiński
Software Developer
Otodom.pl


Re: Solr for real time analytics system

2016-02-04 Thread Rohit Kumar
Thanks Bhimavarapu for the information.

We are creating our own dashboard, so probably wont need kibana/banana. I
was more curious about Solr support for fast aggregation query over very
large data set. As suggested, I guess elasticsearch  has this capability.
Is there any published metrics or data regarding elasticsearch/solr
performance in this area that I can refer to?

Thanks
Rohit



On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu 
wrote:

> Hello Rohit,
>
> You can use the Banana project which was forked from Kibana
> , and works with all kinds of time
> series (and non-time series) data stored in Apache Solr
> . It uses Kibana's powerful dashboard
> configuration capabilities, ports key panels to work with Solr, and
> provides significant additional capabilities, including new panels that
> leverage D3.js 
>
>  would need mostly aggregation queries like sum/average/groupby etc, but
> > data set is quite huge. The aggregation queries should be very fast.
>
>
> all your requirement can be served by this banana but I'm not sure about
> how fast solr compare to ELK 
>
> On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> rohitkumarbhagat...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am quite new to Solr. I have to build a real time analytics system
> which
> > displays metrics based on multiple filters over a huge data set
> (~50million
> > documents with ~100 fileds ).  I would need mostly aggregation queries
> like
> > sum/average/groupby etc, but data set is quite huge. The aggregation
> > queries should be very fast.
> >
> > Is Solr suitable for such use cases?
> >
> > Thanks
> > Rohit
> >
>
>
>
> --
> ckreddybh. 
>