customize posting size(or block size) ...

2016-11-14 Thread Jason
Hi,

Out searching patterns mostly use SpanNearQuery with PrefixQuery.
In addition, single search query includes a lot of PrefixQuery.
Actually, we don't have constraint using PrefixQuery.
For this reason, JVM heap memory usage is often high.
In this time, other queries also hangs.

I'd like to know any solution for low memory usage although search time
takes a little long.
I guess that posting size(or block size) reading at a time can reduce when
scoring SpanNearQuery.
If possible, does the memory usage for processing SpanNearQuery reduce?
and how do I customize?
examples are really helpful to me.

* I'm using Solr 4.2.1.

thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-posting-size-or-block-size-tp4305878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorl shards: very sensitive to swap space usage !?

2016-11-14 Thread Chetas Joshi
Thanks everyone!
The discussion is really helpful.

Hi Toke, can you explain exactly what you mean by "the aggressive IO for
the memory mapping caused the kernel to start swapping parts of the JVM
heap to get better caching of storage data"?
Which JVM are you talking about? Solr shard? I have other services running
on the same host as well.

Thanks!

On Fri, Nov 11, 2016 at 7:32 AM, Shawn Heisey  wrote:

> On 11/11/2016 6:46 AM, Toke Eskildsen wrote:
> > but on two occasions I have
> > experienced heavy swapping with multiple gigabytes free for disk
> > cache. In both cases, the cache-to-index size was fairly low (let's
> > say < 10%). My guess (I don't know the intrinsics of memory mapping
> > vs. swapping) is that the aggressive IO for the memory mapping caused
> > the kernel to start swapping parts of the JVM heap to get better
> > caching of storage data. Yes, with terrible performance as a result.
>
> That's really weird, and sounds like a broken operating system.  I've
> had other issues with swap, but in those cases, free memory was actually
> near zero, and it sounds like your situation was not the same.  So the
> OP here might be having similar problems even if nothing's
> misconfigured.  If so, your solution will probably help them.
>
> > No matter the cause, the swapping problems were "solved" by
> > effectively disabling the swap (swappiness 0).
>
> Solr certainly doesn't need (or even want) swap, if the machine is sized
> right.  I've read some things saying that Linux doesn't behave correctly
> if you completely get rid of all swap, but setting swappiness to zero
> sounds like a good option.  The OS would still utilize swap if it
> actually ran out of physical memory, so you don't lose the safety valve
> that swap normally provides.
>
> Thanks,
> Shawn
>
>


Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
I got it when you said form N queries. Just wanted to try the "get all
cursorMark first" approach but just realized it would be very inefficient
as you said since cursor mark is serialized version of the last sorted
value you received and hence still you are reading the results from solr
although your "fl" -> null.

Just wanted to try this approach as I need everything sorted. In submitting
N queries, I will have to merge sort the results of N queries. But that
should be way better than the first approach I tried.

Thanks!

On Mon, Nov 14, 2016 at 3:58 PM, Erick Erickson 
wrote:

> You're executing all the queries to parallelize before even starting.
> Seems very inefficient. My suggestion doesn't require this first step.
> Perhaps it was confusing because I mentioned "your own cursorMark".
> Really I meant bypass that entirely, just form N queries that were
> restricted to N disjoint subsets of the data and process them all in
> parallel, either with /export or /select.
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 3:53 PM, Chetas Joshi 
> wrote:
> > Thanks Joel for the explanation.
> >
> > Hi Erick,
> >
> > One of the ways I am trying to parallelize the cursor approach is by
> > iterating the result set twice.
> > (1) Once just to get all the cursor marks
> >
> > val q: SolrQuery = new solrj.SolrQuery()
> > q.set("q", query)
> > q.add("fq", query)
> > q.add("rows", batchSize.toString)
> > q.add("collection", collection)
> > q.add("fl", "null")
> > q.add("sort", "id asc")
> >
> > Here I am not asking for any field values ( "fl" -> null )
> >
> > (2) Once I get all the cursor marks, I can start parallel threads to get
> > the results in parallel.
> >
> > However, the first step in fact takes a lot of time. Even more than when
> I
> > would actually iterate through the results with "fl" -> field1, field2,
> > field3
> >
> > Why is this happening?
> >
> > Thanks!
> >
> >
> > On Thu, Nov 10, 2016 at 8:22 PM, Joel Bernstein 
> wrote:
> >
> >> Solr 5 was very early days for Streaming Expressions. Streaming
> Expressions
> >> and SQL use Java 8 so development switched to the 6.0 branch five months
> >> before the 6.0 release. So there was a very large jump in features and
> bug
> >> fixes from Solr 5 to Solr 6 in Streaming Expressions.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, Nov 10, 2016 at 11:14 PM, Joel Bernstein 
> >> wrote:
> >>
> >> > In Solr 5 the /export handler wasn't escaping json text fields, which
> >> > would produce json parse exceptions. This was fixed in Solr 6.0.
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Tue, Nov 8, 2016 at 6:17 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hmm, that should work fine. Let us know what the logs show if
> anything
> >> >> because this is weird.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Tue, Nov 8, 2016 at 1:00 PM, Chetas Joshi  >
> >> >> wrote:
> >> >> > Hi Erick,
> >> >> >
> >> >> > This is how I use the streaming approach.
> >> >> >
> >> >> > Here is the solrconfig block.
> >> >> >
> >> >> > 
> >> >> > 
> >> >> > {!xport}
> >> >> > xsort
> >> >> > false
> >> >> > 
> >> >> > 
> >> >> > query
> >> >> > 
> >> >> > 
> >> >> >
> >> >> > And here is the code in which SolrJ is being used.
> >> >> >
> >> >> > String zkHost = args[0];
> >> >> > String collection = args[1];
> >> >> >
> >> >> > Map props = new HashMap();
> >> >> > props.put("q", "*:*");
> >> >> > props.put("qt", "/export");
> >> >> > props.put("sort", "fieldA asc");
> >> >> > props.put("fl", "fieldA,fieldB,fieldC");
> >> >> >
> >> >> > CloudSolrStream cloudstream = new CloudSolrStream(zkHost,collect
> >> >> ion,props);
> >> >> >
> >> >> > And then I iterate through the cloud stream (TupleStream).
> >> >> > So I am using streaming expressions (SolrJ).
> >> >> >
> >> >> > I have not looked at the solr logs while I started getting the JSON
> >> >> parsing
> >> >> > exceptions. But I will let you know what I see the next time I run
> >> into
> >> >> the
> >> >> > same exceptions.
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > On Sat, Nov 5, 2016 at 9:32 PM, Erick Erickson <
> >> erickerick...@gmail.com
> >> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Hmmm, export is supposed to handle 10s of million result sets. I
> know
> >> >> >> of a situation where the Streaming Aggregation functionality back
> >> >> >> ported to Solr 4.10 processes on that scale. So do you have any
> clue
> >> >> >> what exactly is failing? Is there anything in the Solr logs?
> >> >> >>
> >> >> >> _How_ are you using /export, through Streaming Aggregation
> (SolrJ) or
> >> >> >> just the raw xport handler? It might be worth trying to do this
> from
> >> >> >> SolrJ if you're not, it should be a very quick program to write,
> just
> >> >> >> to test we're talking 100 lines max.
> >> >> >>
> >> >> >> You could always roll your own cursor mark stuff by partitioning
> the
> >>

Re: Parallelize Cursor approach

2016-11-14 Thread Erick Erickson
You're executing all the queries to parallelize before even starting.
Seems very inefficient. My suggestion doesn't require this first step.
Perhaps it was confusing because I mentioned "your own cursorMark".
Really I meant bypass that entirely, just form N queries that were
restricted to N disjoint subsets of the data and process them all in
parallel, either with /export or /select.

Best,
Erick

On Mon, Nov 14, 2016 at 3:53 PM, Chetas Joshi  wrote:
> Thanks Joel for the explanation.
>
> Hi Erick,
>
> One of the ways I am trying to parallelize the cursor approach is by
> iterating the result set twice.
> (1) Once just to get all the cursor marks
>
> val q: SolrQuery = new solrj.SolrQuery()
> q.set("q", query)
> q.add("fq", query)
> q.add("rows", batchSize.toString)
> q.add("collection", collection)
> q.add("fl", "null")
> q.add("sort", "id asc")
>
> Here I am not asking for any field values ( "fl" -> null )
>
> (2) Once I get all the cursor marks, I can start parallel threads to get
> the results in parallel.
>
> However, the first step in fact takes a lot of time. Even more than when I
> would actually iterate through the results with "fl" -> field1, field2,
> field3
>
> Why is this happening?
>
> Thanks!
>
>
> On Thu, Nov 10, 2016 at 8:22 PM, Joel Bernstein  wrote:
>
>> Solr 5 was very early days for Streaming Expressions. Streaming Expressions
>> and SQL use Java 8 so development switched to the 6.0 branch five months
>> before the 6.0 release. So there was a very large jump in features and bug
>> fixes from Solr 5 to Solr 6 in Streaming Expressions.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Nov 10, 2016 at 11:14 PM, Joel Bernstein 
>> wrote:
>>
>> > In Solr 5 the /export handler wasn't escaping json text fields, which
>> > would produce json parse exceptions. This was fixed in Solr 6.0.
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Tue, Nov 8, 2016 at 6:17 PM, Erick Erickson 
>> > wrote:
>> >
>> >> Hmm, that should work fine. Let us know what the logs show if anything
>> >> because this is weird.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Nov 8, 2016 at 1:00 PM, Chetas Joshi 
>> >> wrote:
>> >> > Hi Erick,
>> >> >
>> >> > This is how I use the streaming approach.
>> >> >
>> >> > Here is the solrconfig block.
>> >> >
>> >> > 
>> >> > 
>> >> > {!xport}
>> >> > xsort
>> >> > false
>> >> > 
>> >> > 
>> >> > query
>> >> > 
>> >> > 
>> >> >
>> >> > And here is the code in which SolrJ is being used.
>> >> >
>> >> > String zkHost = args[0];
>> >> > String collection = args[1];
>> >> >
>> >> > Map props = new HashMap();
>> >> > props.put("q", "*:*");
>> >> > props.put("qt", "/export");
>> >> > props.put("sort", "fieldA asc");
>> >> > props.put("fl", "fieldA,fieldB,fieldC");
>> >> >
>> >> > CloudSolrStream cloudstream = new CloudSolrStream(zkHost,collect
>> >> ion,props);
>> >> >
>> >> > And then I iterate through the cloud stream (TupleStream).
>> >> > So I am using streaming expressions (SolrJ).
>> >> >
>> >> > I have not looked at the solr logs while I started getting the JSON
>> >> parsing
>> >> > exceptions. But I will let you know what I see the next time I run
>> into
>> >> the
>> >> > same exceptions.
>> >> >
>> >> > Thanks
>> >> >
>> >> > On Sat, Nov 5, 2016 at 9:32 PM, Erick Erickson <
>> erickerick...@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Hmmm, export is supposed to handle 10s of million result sets. I know
>> >> >> of a situation where the Streaming Aggregation functionality back
>> >> >> ported to Solr 4.10 processes on that scale. So do you have any clue
>> >> >> what exactly is failing? Is there anything in the Solr logs?
>> >> >>
>> >> >> _How_ are you using /export, through Streaming Aggregation (SolrJ) or
>> >> >> just the raw xport handler? It might be worth trying to do this from
>> >> >> SolrJ if you're not, it should be a very quick program to write, just
>> >> >> to test we're talking 100 lines max.
>> >> >>
>> >> >> You could always roll your own cursor mark stuff by partitioning the
>> >> >> data amongst N threads/processes if you have any reasonable
>> >> >> expectation that you could form filter queries that partition the
>> >> >> result set anywhere near evenly.
>> >> >>
>> >> >> For example, let's say you have a field with random numbers between 0
>> >> >> and 100. You could spin off 10 cursorMark-aware processes each with
>> >> >> its own fq clause like
>> >> >>
>> >> >> fq=partition_field:[0 TO 10}
>> >> >> fq=[10 TO 20}
>> >> >> 
>> >> >> fq=[90 TO 100]
>> >> >>
>> >> >> Note the use of inclusive/exclusive end points
>> >> >>
>> >> >> Each one would be totally independent of all others with no
>> >> >> overlapping documents. And since the fq's would presumably be cached
>> >> >> you should be able to go as fast as you can drive your cluster. Of
>> >> >> course you lose query-wide sorting and the like, if that's important
>> >> >> you'd need to figure som

Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
Thanks Joel for the explanation.

Hi Erick,

One of the ways I am trying to parallelize the cursor approach is by
iterating the result set twice.
(1) Once just to get all the cursor marks

val q: SolrQuery = new solrj.SolrQuery()
q.set("q", query)
q.add("fq", query)
q.add("rows", batchSize.toString)
q.add("collection", collection)
q.add("fl", "null")
q.add("sort", "id asc")

Here I am not asking for any field values ( "fl" -> null )

(2) Once I get all the cursor marks, I can start parallel threads to get
the results in parallel.

However, the first step in fact takes a lot of time. Even more than when I
would actually iterate through the results with "fl" -> field1, field2,
field3

Why is this happening?

Thanks!


On Thu, Nov 10, 2016 at 8:22 PM, Joel Bernstein  wrote:

> Solr 5 was very early days for Streaming Expressions. Streaming Expressions
> and SQL use Java 8 so development switched to the 6.0 branch five months
> before the 6.0 release. So there was a very large jump in features and bug
> fixes from Solr 5 to Solr 6 in Streaming Expressions.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 10, 2016 at 11:14 PM, Joel Bernstein 
> wrote:
>
> > In Solr 5 the /export handler wasn't escaping json text fields, which
> > would produce json parse exceptions. This was fixed in Solr 6.0.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Nov 8, 2016 at 6:17 PM, Erick Erickson 
> > wrote:
> >
> >> Hmm, that should work fine. Let us know what the logs show if anything
> >> because this is weird.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Nov 8, 2016 at 1:00 PM, Chetas Joshi 
> >> wrote:
> >> > Hi Erick,
> >> >
> >> > This is how I use the streaming approach.
> >> >
> >> > Here is the solrconfig block.
> >> >
> >> > 
> >> > 
> >> > {!xport}
> >> > xsort
> >> > false
> >> > 
> >> > 
> >> > query
> >> > 
> >> > 
> >> >
> >> > And here is the code in which SolrJ is being used.
> >> >
> >> > String zkHost = args[0];
> >> > String collection = args[1];
> >> >
> >> > Map props = new HashMap();
> >> > props.put("q", "*:*");
> >> > props.put("qt", "/export");
> >> > props.put("sort", "fieldA asc");
> >> > props.put("fl", "fieldA,fieldB,fieldC");
> >> >
> >> > CloudSolrStream cloudstream = new CloudSolrStream(zkHost,collect
> >> ion,props);
> >> >
> >> > And then I iterate through the cloud stream (TupleStream).
> >> > So I am using streaming expressions (SolrJ).
> >> >
> >> > I have not looked at the solr logs while I started getting the JSON
> >> parsing
> >> > exceptions. But I will let you know what I see the next time I run
> into
> >> the
> >> > same exceptions.
> >> >
> >> > Thanks
> >> >
> >> > On Sat, Nov 5, 2016 at 9:32 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hmmm, export is supposed to handle 10s of million result sets. I know
> >> >> of a situation where the Streaming Aggregation functionality back
> >> >> ported to Solr 4.10 processes on that scale. So do you have any clue
> >> >> what exactly is failing? Is there anything in the Solr logs?
> >> >>
> >> >> _How_ are you using /export, through Streaming Aggregation (SolrJ) or
> >> >> just the raw xport handler? It might be worth trying to do this from
> >> >> SolrJ if you're not, it should be a very quick program to write, just
> >> >> to test we're talking 100 lines max.
> >> >>
> >> >> You could always roll your own cursor mark stuff by partitioning the
> >> >> data amongst N threads/processes if you have any reasonable
> >> >> expectation that you could form filter queries that partition the
> >> >> result set anywhere near evenly.
> >> >>
> >> >> For example, let's say you have a field with random numbers between 0
> >> >> and 100. You could spin off 10 cursorMark-aware processes each with
> >> >> its own fq clause like
> >> >>
> >> >> fq=partition_field:[0 TO 10}
> >> >> fq=[10 TO 20}
> >> >> 
> >> >> fq=[90 TO 100]
> >> >>
> >> >> Note the use of inclusive/exclusive end points
> >> >>
> >> >> Each one would be totally independent of all others with no
> >> >> overlapping documents. And since the fq's would presumably be cached
> >> >> you should be able to go as fast as you can drive your cluster. Of
> >> >> course you lose query-wide sorting and the like, if that's important
> >> >> you'd need to figure something out there.
> >> >>
> >> >> Do be aware of a potential issue. When regular doc fields are
> >> >> returned, for each document returned, a 16K block of data will be
> >> >> decompressed to get the stored field data. Streaming Aggregation
> >> >> (/xport) reads docValues entries which are held in MMapDirectory
> space
> >> >> so will be much, much faster. As of Solr 5.5. You can override the
> >> >> decompression stuff, see:
> >> >> https://issues.apache.org/jira/browse/SOLR-8220 for fields that are
> >> >> both stored and docvalues...
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Sat, Nov 5, 20

Re: Editing schema and solrconfig files

2016-11-14 Thread Erick Erickson
Oh, and of course there's the whole managed schema capabilities where
you use API end points to modify the schema file and a similar for
some parts of solrconfig.xml. That said, though, for any kind of
serious installation I'd still be pulling the modified configs off of
ZK and putting them in source code control. If you're not going to
create a UI to manipulate the schema, I find just using the scripts or
zkcli about as fast.

Best,
Erick

On Mon, Nov 14, 2016 at 2:22 PM, Reth RM  wrote:
> There's a way to add/update/delete schema fields, this is helpful.
> https://jpst.it/Pqqz
> although no way to add field-Type
>
> On Wed, Nov 9, 2016 at 2:20 PM, Erick Erickson 
> wrote:
>
>> We had the bright idea of allowing editing of the config files through
>> the UI... but the ability to upload arbitrary XML is a security
>> vulnerability, so that idea was nixed.
>>
>> The solr/bin script has an upconfig and downconfig command that are (I
>> hope) easier to use than zkcli, I think from 5.5. In Solr 6.2 the
>> solr/bin script has been enhanced to allow other ZK operations. Not
>> quite what you were looking for, but I thought I'd mention it.
>>
>> There are some ZK clients out there that'll let you edit files
>> directly in ZK, and I know IntelliJ also has a plugin that'll allow
>> you to do that from the IDE, don't know about Eclipse but I expect it
>> does.
>>
>> I usually edit them locally and set up a shell script to push them up
>> as necessary...
>>
>> FWIW,
>> Erick
>>
>> On Wed, Nov 9, 2016 at 2:09 PM, John Bickerstaff
>>  wrote:
>> > I never found a way to do it through the UI... and ended up using "nano"
>> on
>> > linux for simple things.
>> >
>> > For more complex stuff, I scp'd the file (or the whole conf directory) up
>> > to my dev box (a Mac in my case) and edited in a decent UI tool, then
>> scp'd
>> > the whole thing back...  I wrote a simple bash script to automate the scp
>> > process on both ends once I got tired of typing it over and over...
>> >
>> > On Wed, Nov 9, 2016 at 3:05 PM, Reth RM  wrote:
>> >
>> >> What are some easiest ways to edit/modify/add conf files, such as
>> >> solrconfig.xml and schema.xml other than APIs end points or using zk
>> >> commands to re-upload modified file?
>> >>
>> >> In other words, can we edit conf files through solr admin (GUI)
>> >> interface(add new filed by click on button or add new request handler on
>> >> click?)  with feature of enabling/disabling same feature as required?
>> >>
>>


Re: RTF Rich text format

2016-11-14 Thread Alexandre Rafalovitch
The logical place to do that (if you cannot do outside of Solr) would
be in an UpdateRequestProcessor.

Unfortunately, there is no TikaExtract URP though other similar ones
exist (e.g. for language guessing). The full list is here:
http://www.solr-start.com/info/update-request-processors/

But you could write. You'd have to be very careful about using Tika to
not leak memory and to handle the failure states, but technically it
should be possible.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 November 2016 at 04:01, Sergio García Maroto  wrote:
> Thanks for the response.
>
> I am afraid I can't use the DataImportHandler. I do the indexation using an
> Indexation Service joining data from several places.
>
> I have a final xml with plenty of data and one of them is the rtf field.
> That's the xml I send to Solr using the /update. I am guessing if it would
> be possible Solr to do it with a tokenizer filter or something like that.
>
> On 14 November 2016 at 16:24, Alexandre Rafalovitch 
> wrote:
>
>> I think DataImportHandler with nested entity (JDBC, then Tika with
>> FieldReaderDataSource) should do the trick.
>>
>> Have you tried that?
>>
>> Regards,
>>Alex.
>> 
>> Solr Example reading group is starting November 2016, join us at
>> http://j.mp/SolrERG
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 15 November 2016 at 03:19, marotosg  wrote:
>> > Hi,
>> >
>> > I have a use case where I need to index information coming from a
>> database
>> > where there is a field which contains rich text format. I would like to
>> > convert that text into simple plain text, same as tika does when indexing
>> > documents.
>> >
>> > Is there any way to achive that having a field only where i sent this
>> rich
>> > text and then Solr cleans that data? I can't find anyhting so far.
>> >
>> > Thanks
>> > Sergio
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.
>> nabble.com/RTF-Rich-text-format-tp4305778.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: Editing schema and solrconfig files

2016-11-14 Thread Reth RM
There's a way to add/update/delete schema fields, this is helpful.
https://jpst.it/Pqqz
although no way to add field-Type

On Wed, Nov 9, 2016 at 2:20 PM, Erick Erickson 
wrote:

> We had the bright idea of allowing editing of the config files through
> the UI... but the ability to upload arbitrary XML is a security
> vulnerability, so that idea was nixed.
>
> The solr/bin script has an upconfig and downconfig command that are (I
> hope) easier to use than zkcli, I think from 5.5. In Solr 6.2 the
> solr/bin script has been enhanced to allow other ZK operations. Not
> quite what you were looking for, but I thought I'd mention it.
>
> There are some ZK clients out there that'll let you edit files
> directly in ZK, and I know IntelliJ also has a plugin that'll allow
> you to do that from the IDE, don't know about Eclipse but I expect it
> does.
>
> I usually edit them locally and set up a shell script to push them up
> as necessary...
>
> FWIW,
> Erick
>
> On Wed, Nov 9, 2016 at 2:09 PM, John Bickerstaff
>  wrote:
> > I never found a way to do it through the UI... and ended up using "nano"
> on
> > linux for simple things.
> >
> > For more complex stuff, I scp'd the file (or the whole conf directory) up
> > to my dev box (a Mac in my case) and edited in a decent UI tool, then
> scp'd
> > the whole thing back...  I wrote a simple bash script to automate the scp
> > process on both ends once I got tired of typing it over and over...
> >
> > On Wed, Nov 9, 2016 at 3:05 PM, Reth RM  wrote:
> >
> >> What are some easiest ways to edit/modify/add conf files, such as
> >> solrconfig.xml and schema.xml other than APIs end points or using zk
> >> commands to re-upload modified file?
> >>
> >> In other words, can we edit conf files through solr admin (GUI)
> >> interface(add new filed by click on button or add new request handler on
> >> click?)  with feature of enabling/disabling same feature as required?
> >>
>


how to tell SolrHttpServer client to accept/ignore all certs?

2016-11-14 Thread Robert Hume
I'm using HttpSolrServer (in Solr 3.6) to connect to a Solr web service and
perform a query.

The certificate at the other end has expired and so connections now fail.

It will take the IT at the other end too many days to replace the cert
(this is out of my control).

How can I tell the HttpSolrServer to ignore bad certs when it does queries
to the server?

NOTE 1: I noticed that I can pass my own Apache HttpClient (we're currently
using 4.3) into the HttpSolrServer constructor, but internally
HttpSolrServer seems to do a lot of customizing/configuring it's own
default HttpClient, so I didn't want to mess with that.

NOTE: This is an 100% internal application so there is real security
problems with this temporary workaround.

Thanks!!

rh


Re: index and data directories

2016-11-14 Thread Erick Erickson
Theoretically, perhaps. And it's quite true that stored data for
fields marked stored=true are just passed through verbatim and
compressed on disk while the data associated with indexed=true fields
go through an analysis chain and are stored in a much different
format. However these different data are simply stored in files with
different suffixes in a segment. So you might have _0.fdx, _0.fdt,
_0.tim, _0.tvx etc. that together form a single segment.

This is done on a per-segment basis. So certain segment files, namely
the *.fdt and *.fdx file will contain the stored data while other
extensions have the indexed data, see: "File naming" here for a
somewhat out of date format, but close enough for this discussion:
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
And there's no option to store the *.fdt and *.fdx files independently
from the rest of the segment files.

This statement: "I mean documents which are to be indexed" really
doesn't make sense. You send these things called Solr documents to be
indexed, but they are just a set of fields with values handled as
their definitions indicate (i.e. respecting stored=true|false,
indexed=true
false, docValues=true|false. The Solr document sent by SolrJ is simply
thrown away after processing into segment files.

If you're sending semi-structured docs (say Word, PDF etc) to be
indexed through Tika they are simply transformed into a Solr doc (set
of field/value pairs) and the original document is thrown away as
well. There's no option to store the original semi-structured doc
either.


Best,
Erick

On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J
 wrote:
>
> By data, I mean documents which are to be indexed. Some fields can be 
> stored="true" but that doesn’t matter.
>
> For example: App1 creates an object (AppObj) to be indexed and sends it to 
> SOLR via solrj. Some of the attributes of this object can be declared to be 
> used for storage.
>
> Now, my understanding is data and indexes generated on data are two separate 
> things. In my particular example, all fields have stored="true" but only 
> selected fields have indexed="true". My expectation is, indexes are stored 
> separately from data because indexes can be generated by different 
> techniques/algorithms but data/documents remain unchanged. Please correct me 
> if my understanding is not correct.
>
>
> Regards,
> Prateek Jain
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 14 November 2016 07:05 PM
> To: solr-user 
> Subject: Re: index and data directories
>
> The question is pretty opaque. What do you mean by "data" as opposed to 
> "indexes"? Are you talking about where Lucene puts stored="true"
> fields? If not, what do you mean by "data"?
>
> If you are talking about where Lucene puts the stored="true" bits the no, 
> there's no way to segregate that our from the other files that make up a 
> segment.
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J  
> wrote:
>>
>> Hi Alex,
>>
>>  I am unable to get it correctly. Is it possible to store indexes and data 
>> separately?
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: 14 November 2016 03:53 PM
>> To: solr-user 
>> Subject: Re: index and data directories
>>
>> solr.xml also has a bunch of properties under the core tag:
>>
>>   
>> 
>>   
>> 
>>   
>>
>> You can get the Reference Guide for your specific version here:
>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>
>> Regards,
>>Alex.
>> 
>> Solr Example reading group is starting November 2016, join us at 
>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
>> intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 15 November 2016 at 02:37, Prateek Jain J  
>> wrote:
>>>
>>> Hi All,
>>>
>>> We are using solr 4.8.1 and would like to know if it is possible to
>>> store data and indexes in separate directories? I know following tag
>>> exist in solrconfig.xml file
>>>
>>> 
>>> C:/del-it/solr/cm_events_nbi/data
>>>
>>>
>>>
>>> Regards,
>>> Prateek Jain


RE: index and data directories

2016-11-14 Thread Prateek Jain J

By data, I mean documents which are to be indexed. Some fields can be 
stored="true" but that doesn’t matter.

For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR 
via solrj. Some of the attributes of this object can be declared to be used for 
storage. 

Now, my understanding is data and indexes generated on data are two separate 
things. In my particular example, all fields have stored="true" but only 
selected fields have indexed="true". My expectation is, indexes are stored 
separately from data because indexes can be generated by different 
techniques/algorithms but data/documents remain unchanged. Please correct me if 
my understanding is not correct.


Regards,
Prateek Jain

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 14 November 2016 07:05 PM
To: solr-user 
Subject: Re: index and data directories

The question is pretty opaque. What do you mean by "data" as opposed to 
"indexes"? Are you talking about where Lucene puts stored="true"
fields? If not, what do you mean by "data"?

If you are talking about where Lucene puts the stored="true" bits the no, 
there's no way to segregate that our from the other files that make up a 
segment.

Best,
Erick

On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J  
wrote:
>
> Hi Alex,
>
>  I am unable to get it correctly. Is it possible to store indexes and data 
> separately?
>
>
> Regards,
> Prateek Jain
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: 14 November 2016 03:53 PM
> To: solr-user 
> Subject: Re: index and data directories
>
> solr.xml also has a bunch of properties under the core tag:
>
>   
> 
>   
> 
>   
>
> You can get the Reference Guide for your specific version here:
> http://archive.apache.org/dist/lucene/solr/ref-guide/
>
> Regards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at 
> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
> intermediates:
> http://www.solr-start.com/
>
>
> On 15 November 2016 at 02:37, Prateek Jain J  
> wrote:
>>
>> Hi All,
>>
>> We are using solr 4.8.1 and would like to know if it is possible to 
>> store data and indexes in separate directories? I know following tag 
>> exist in solrconfig.xml file
>>
>> 
>> C:/del-it/solr/cm_events_nbi/data
>>
>>
>>
>> Regards,
>> Prateek Jain


Re: index and data directories

2016-11-14 Thread Erick Erickson
The question is pretty opaque. What do you mean by "data" as opposed
to "indexes"? Are you talking about where Lucene puts stored="true"
fields? If not, what do you mean by "data"?

If you are talking about where Lucene puts the stored="true" bits the
no, there's no way to segregate that our from the other files that
make up a segment.

Best,
Erick

On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J
 wrote:
>
> Hi Alex,
>
>  I am unable to get it correctly. Is it possible to store indexes and data 
> separately?
>
>
> Regards,
> Prateek Jain
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: 14 November 2016 03:53 PM
> To: solr-user 
> Subject: Re: index and data directories
>
> solr.xml also has a bunch of properties under the core tag:
>
>   
> 
>   
> 
>   
>
> You can get the Reference Guide for your specific version here:
> http://archive.apache.org/dist/lucene/solr/ref-guide/
>
> Regards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at 
> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
> intermediates:
> http://www.solr-start.com/
>
>
> On 15 November 2016 at 02:37, Prateek Jain J  
> wrote:
>>
>> Hi All,
>>
>> We are using solr 4.8.1 and would like to know if it is possible to
>> store data and indexes in separate directories? I know following tag
>> exist in solrconfig.xml file
>>
>> 
>> C:/del-it/solr/cm_events_nbi/data
>>
>>
>>
>> Regards,
>> Prateek Jain


Re: Filtering a field when some of the documents don't have the value

2016-11-14 Thread Erick Erickson
You want something like:
name:x&fq=population:[10 TO *] OR (*:* -population:*:*)

Best,
Erick

On Mon, Nov 14, 2016 at 10:29 AM, Gintautas Sulskus
 wrote:
> Hi,
>
> I have an index with two fields "name" and "population". Some of the
> documents have the "population" field empty.
>
> I would like to search for a value X in field "name" with the following
> condition:
> 1. if the field is empty - return results for
> name:X
> 2. else set the minimum value for the "population" field to 10:
>  name:X AND population: [10 TO *]
> The population field should not influence the score.
>
> Could you please help me out with the query construction?
> I have tried conditional statements with exists(), but it seems it does not
> suit the case.
>
> Thanks,
> Gin


Filtering a field when some of the documents don't have the value

2016-11-14 Thread Gintautas Sulskus
Hi,

I have an index with two fields "name" and "population". Some of the
documents have the "population" field empty.

I would like to search for a value X in field "name" with the following
condition:
1. if the field is empty - return results for
name:X
2. else set the minimum value for the "population" field to 10:
 name:X AND population: [10 TO *]
The population field should not influence the score.

Could you please help me out with the query construction?
I have tried conditional statements with exists(), but it seems it does not
suit the case.

Thanks,
Gin


Re: RTF Rich text format

2016-11-14 Thread Sergio García Maroto
Thanks for the response.

I am afraid I can't use the DataImportHandler. I do the indexation using an
Indexation Service joining data from several places.

I have a final xml with plenty of data and one of them is the rtf field.
That's the xml I send to Solr using the /update. I am guessing if it would
be possible Solr to do it with a tokenizer filter or something like that.

On 14 November 2016 at 16:24, Alexandre Rafalovitch 
wrote:

> I think DataImportHandler with nested entity (JDBC, then Tika with
> FieldReaderDataSource) should do the trick.
>
> Have you tried that?
>
> Regards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 November 2016 at 03:19, marotosg  wrote:
> > Hi,
> >
> > I have a use case where I need to index information coming from a
> database
> > where there is a field which contains rich text format. I would like to
> > convert that text into simple plain text, same as tika does when indexing
> > documents.
> >
> > Is there any way to achive that having a field only where i sent this
> rich
> > text and then Solr cleans that data? I can't find anyhting so far.
> >
> > Thanks
> > Sergio
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/RTF-Rich-text-format-tp4305778.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: RTF Rich text format

2016-11-14 Thread Alexandre Rafalovitch
I think DataImportHandler with nested entity (JDBC, then Tika with
FieldReaderDataSource) should do the trick.

Have you tried that?

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 November 2016 at 03:19, marotosg  wrote:
> Hi,
>
> I have a use case where I need to index information coming from a database
> where there is a field which contains rich text format. I would like to
> convert that text into simple plain text, same as tika does when indexing
> documents.
>
> Is there any way to achive that having a field only where i sent this rich
> text and then Solr cleans that data? I can't find anyhting so far.
>
> Thanks
> Sergio
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/RTF-Rich-text-format-tp4305778.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RTF Rich text format

2016-11-14 Thread marotosg
Hi,

I have a use case where I need to index information coming from a database
where there is a field which contains rich text format. I would like to
convert that text into simple plain text, same as tika does when indexing
documents.

Is there any way to achive that having a field only where i sent this rich
text and then Solr cleans that data? I can't find anyhting so far.

Thanks
Sergio



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RTF-Rich-text-format-tp4305778.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sorting by date not working on dates earlier than EPOCH

2016-11-14 Thread marotosg
Hi there.
I have found a possible solution for this issue.

   





--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-by-date-not-working-on-dates-earlier-than-EPOCH-tp4303456p4305770.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index and data directories

2016-11-14 Thread Prateek Jain J

Hi Alex,

 I am unable to get it correctly. Is it possible to store indexes and data 
separately? 


Regards,
Prateek Jain

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: 14 November 2016 03:53 PM
To: solr-user 
Subject: Re: index and data directories

solr.xml also has a bunch of properties under the core tag:

  

  

  

You can get the Reference Guide for your specific version here:
http://archive.apache.org/dist/lucene/solr/ref-guide/

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at 
http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
intermediates:
http://www.solr-start.com/


On 15 November 2016 at 02:37, Prateek Jain J  
wrote:
>
> Hi All,
>
> We are using solr 4.8.1 and would like to know if it is possible to 
> store data and indexes in separate directories? I know following tag 
> exist in solrconfig.xml file
>
> 
> C:/del-it/solr/cm_events_nbi/data
>
>
>
> Regards,
> Prateek Jain


Re: DIH problem with multiple (types of) resources

2016-11-14 Thread Alexandre Rafalovitch
On 15 November 2016 at 02:19, Peter Blokland  wrote:
> 

> 

Attribute names are case sensitive as far as I remember. Try
'dataSource' for the second definition.

Regards,
   Alex.


Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


Re: index and data directories

2016-11-14 Thread Alexandre Rafalovitch
solr.xml also has a bunch of properties under the core tag:

  

  

  

You can get the Reference Guide for your specific version here:
http://archive.apache.org/dist/lucene/solr/ref-guide/

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 November 2016 at 02:37, Prateek Jain J
 wrote:
>
> Hi All,
>
> We are using solr 4.8.1 and would like to know if it is possible to store 
> data and indexes in separate directories? I know following tag exist in 
> solrconfig.xml file
>
> 
> C:/del-it/solr/cm_events_nbi/data
>
>
>
> Regards,
> Prateek Jain


index and data directories

2016-11-14 Thread Prateek Jain J

Hi All,

We are using solr 4.8.1 and would like to know if it is possible to store data 
and indexes in separate directories? I know following tag exist in 
solrconfig.xml file


C:/del-it/solr/cm_events_nbi/data



Regards,
Prateek Jain


DIH problem with multiple (types of) resources

2016-11-14 Thread Peter Blokland
hi,

I'm porting an old data-import configuratie from 4.x to 6.3.0. a minimal config
is this :


  

  

  


  http://site/nl/${page.pid}"; format="text">

  



  


  




when I try to do a full import with this, I get :

2016-11-14 12:31:52.173 INFO  (Thread-68) [   x:meulboek] 
o.a.s.u.p.LogUpdateProcessorFactory [meulboek]  webapp=/solr path=/dataimport 
params={core=meulboek&optimize=false&indent=on&commit=true&clean=true&wt=json&command=full-import&_=1479122291861&verbose=true}
 status=0 QTime=11{deleteByQuery=*:* 
(-1550976769832517632),add=[ed99517c-ece9-40c6-9682-c9ec74173241 
(1550976769976172544), 9283532a-2395-43eb-bcb8-fd30c5ebfd08 
(1550976770348417024), 87b75d5c-a12a-4538-bc29-ceb13d6a9d1c 
(1550976770455371776), 476b5da3-3752-4867-bdb3-4264403c5c2d 
(1550976770787770368), 71cdaadb-62ba-4753-ad1b-01ba7fd75bfa 
(1550976770875850752), 02f41269-4a28-4001-aab9-7b1feb51e332 
(1550976770954493952), 6216ec48-2abd-465b-8d6b-60907c7f49db 
(1550976771047817216), 4317b308-dc88-47e1-9240-0d7d94646de6 
(1550976771136946176), 159ee092-2f72-45f6-970e-9dfd6d635bdf 
(1550976771221880832), bdfa48c4-23e2-483f-9b63-e0c5753d60a5 
(1550976771336175616)]} 0 1465
2016-11-14 12:31:52.173 ERROR (Thread-68) [   x:meulboek] 
o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException: 
java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in 
invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in 
invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Exception in invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 6 more
Caused by: java.net.MalformedURLException: no protocol: nullselect edition from 
editions
at java.net.URL.(URL.java:593)
at java.net.URL.(URL.java:490)
at java.net.URL.(URL.java:439)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81)
... 12 more


note that this failure occurrs with the second entity, and judging from this
line :

Caused by: java.net.MalformedURLException: no protocol: nullselect edition from 
editions

it seems solr tries to use the datasource named "web" (the BinURLDataSource)
instead of the configured "db" datasource (the JdbcDataSource). am I doing
something wrong, or is this a bug ? 

-- 
CUL8R, Peter.

www.desk.nl

Your excuse is: Communist revolutionaries taking over the server room and 
demanding all the computers in the building or they shoot the sysadmin. Poor 
misguided fools.


Suggestions

2016-11-14 Thread Arkadi Colson
Is there a chance that suggestions will be generated at indexing time 
and not afterwards based on indexed data? This will make it possible to 
suggest on fields which are not "stored". Or is there another way to 
make suggestion like behavior possible?


Thx!
Arkadi


Re: price sort

2016-11-14 Thread Emir Arnautovic

Hi Midas,

You can boost result by reciprocal value of price, but that does not 
guaranty that there will not be irrelevant result first because of it is 
cheap.


Emir


On 14.11.2016 11:19, Midas A wrote:

Thanks for replying ,

i want to maintain relevancy  along with price sorting \

for example if i search "nike shoes"

According to relevance  "nike shoes"  come first then tshirt (other
product) from nike .

and now if we sort the results  tshirt from nike come on the top . this is
some thing that is not users intent .

In this situation we have to adopt mediocre approach  that does not change
users intent .


On Mon, Nov 14, 2016 at 2:38 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Midas,

Sorting by price means that score (~relevancy) is ignored/used as second
sorting criteria. My assumption is that you have long tail of false
positives causing sort by price to sort cheap, unrelated items first just
because they matched by some stop word.

Or I missed your question?

Emir



On 14.11.2016 06:39, Midas A wrote:


Hi,

we are in e-commerce business  and we have to give price sort
functionality
.
what logic should we use that does not break the relevance .
please give the query for the same assuming dummy fields.



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: price sort

2016-11-14 Thread Midas A
Thanks for replying ,

i want to maintain relevancy  along with price sorting \

for example if i search "nike shoes"

According to relevance  "nike shoes"  come first then tshirt (other
product) from nike .

and now if we sort the results  tshirt from nike come on the top . this is
some thing that is not users intent .

In this situation we have to adopt mediocre approach  that does not change
users intent .


On Mon, Nov 14, 2016 at 2:38 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Midas,
>
> Sorting by price means that score (~relevancy) is ignored/used as second
> sorting criteria. My assumption is that you have long tail of false
> positives causing sort by price to sort cheap, unrelated items first just
> because they matched by some stop word.
>
> Or I missed your question?
>
> Emir
>
>
>
> On 14.11.2016 06:39, Midas A wrote:
>
>> Hi,
>>
>> we are in e-commerce business  and we have to give price sort
>> functionality
>> .
>> what logic should we use that does not break the relevance .
>> please give the query for the same assuming dummy fields.
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Collection sincronization and AWS instace autoscale

2016-11-14 Thread Iván Martínez Castro
Hi,

I have a SolrCloud 4.9.1 setup with 4 nodes, 50 collections /1 shard and 4
replicas per collection

Question one: What happens with collection data when I shutdown one node?
When I start this node again, ZK would update the collection data?

Question two: If I setup an auto scale load based instance on AWS, when a
new node is started, what is the best way to add collections replicas to
this new node? Running a script via opsworks? ZK conf?

Thanks

Iván
--


Re: spell checking on query

2016-11-14 Thread Emir Arnautovic

Hi Midas,

You can use Solr's spellcheck component: 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking


Emir


On 14.11.2016 08:37, Midas A wrote:

How can we do the query time spell checking with help of solr .



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: price sort

2016-11-14 Thread Emir Arnautovic

Hi Midas,

Sorting by price means that score (~relevancy) is ignored/used as second 
sorting criteria. My assumption is that you have long tail of false 
positives causing sort by price to sort cheap, unrelated items first 
just because they matched by some stop word.


Or I missed your question?

Emir


On 14.11.2016 06:39, Midas A wrote:

Hi,

we are in e-commerce business  and we have to give price sort functionality
.
what logic should we use that does not break the relevance .
please give the query for the same assuming dummy fields.



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: facet query performance

2016-11-14 Thread Toke Eskildsen
On Mon, 2016-11-14 at 11:36 +0530, Midas A wrote:
> How to improve facet query performance

1) Don't shard unless you really need to. Replicas are fine.

2) If the problem is the first facet call, then enable DocValues and
re-index.

3) Keep facet.limit <= 100, especially if you shard.

and most important

4) Describe in detail what you have, how you facet and what you expect.
Give us something to work.


- Toke Eskildsen, State and University Library, Denmark