Re: Solr or Elasticsearch

2018-03-22 Thread Shawn Heisey

On 3/22/2018 7:13 AM, Steven White wrote:

There are some good write ups on the internet comparing the two and the one
thing that keeps coming up about Elasticsearch being superior to Solr is
it's analytic capability.  However, I cannot find what those analytic
capabilities are and why they cannot be done using Solr.  Can someone help
me with this question?


The first thing I have to say is that getting an unbiased opinion on a 
Solr mailing list is probably never going to happen.



Personally, I'm a Solr user and the thing that concerns me about
Elasticsearch is the fact that it is owned by a company that can  any day
decide to stop making Elasticsearch avaialble under Apache license and even
completely close free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done
using Solr?  I want to see a complete list if possible.


There's very little that you can do with one that you can't do with the 
other.  I haven't actually installed or tried ES, but I do try to at 
least be semi-aware of what's going on in this little world of search.


ES seems to handle a novice user a lot better than Solr does.

Solr has a very steep learning curve ... and can even be challenging for 
a seasoned system administrator when they first start working with it.  
The documentation is excellent for someone who already knows their way 
around, but for a beginner, is very overwhelming.  Once the basics 
becomes familiar, working with it is relatively easy.


My general impression is that Solr has far more capability built in than 
ES does.  This is reflected in download size.  The ES download is less 
than 30 MB, while Solr weighs in at nearly 150MB.  Most users are never 
going to need all that additional stuff that Solr includes.  The 
majority of an ES download is Lucene, while Solr includes a LOT of 
additional libraries in the main application, and a TON of contrib 
material.  The configurations tend to have a "kitchen sink" mentality, 
while the sample configs for ES are pretty lightweight.


It's my understanding that in performance tests, if you actually 
configure both systems similar to each other (NOT using out-of-the-box 
configurations), that performance is about the same, and Solr might be 
slightly faster.


ES has the ELK stack, which from what I understand, makes log management 
a lot easier.  Solr can be coaxed into doing everything ES does for log 
management, but there's nothing *included* with Solr to do the heavy 
lifting for you.  LucidWorks has the SiLK stack, which I once tried to 
get working, without success.


There is a software package called Jepsen that has been used to test 
both ES and Solr, to learn how they deal with network environment 
issues.  Solr did quite well.  The testing revealed some bugs, which 
have been fixed.  For ES, the testing revealed some fundamental design 
problems.  I do not know whether those problems have been fixed in newer 
versions or not.



2) Should an Elasticsearch user be worried that Elasticsearch may close
it's open-source policy at anytime or that outsiders have no say about it's
road map?


I doubt they'll ever completely close things up.  Like Solr, most of its 
functionality comes from Lucene.  Because Lucene is under the Apache 
umbrella, it's not likely to disappear, and without Lucene, both Solr 
and ES are nothing.


I don't have any sense as to how much the Elastic dev team listens to 
its community.  Solr has a great community, and many of its members 
contribute a lot to Solr development.


Elastic might decide that they can make more money by switching to an 
"open core" model ... but I think that is becoming less common in recent 
years, because enough users are familiar with it and aren't fooled.  It 
doesn't seem likely to me that they would try it.


Solr is a subproject of Lucene, and is found in the same source code 
repository as Lucene.  Its releases are in lock-step with Lucene, and 
the integration is very tight.  The team of committers for Lucene has 
people from both projects on it.


Thanks,
Shawn



Re: Solr or Elasticsearch

2018-03-22 Thread Minoru Osuka
Hi Steve,

I have contributed the Solr Prometheus Exporter that allows users to monitor 
not only Solr metrics which come from Metrics API, but also facet counts which 
come from Searching and responses to Collections API commands and 
PingRequestHandler requests.

I think that you can also analyze and visualize indexed data combined with Math 
Expressions that Joel mentioned.
Please see following document if you are interested in.

https://github.com/apache/lucene-solr/blob/branch_7_3/solr/solr-ref-guide/src/monitoring-solr-with-prometheus-and-grafana.adoc
 


Minoru

> On Mar 23, 2018, at 11:22 AM, Joel Bernstein  wrote:
> 
> Solr 7.3 has very sophisticated math capabilities described below:
> 
> https://github.com/joel-bernstein/lucene-solr/blob/math_expressions_documentation/solr/solr-ref-guide/src/math-expressions.adoc
> 
> This is the userguide for math expressions which didn't make the 7.3
> release but all the functions are committed and working for 7.3. It's only
> the documentation that is lagging behind. About 30 new
> functions are added with every release.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Thu, Mar 22, 2018 at 4:28 PM, Steven White  wrote:
> 
>> Thank you all for your input.
>> 
>> The one question still remains: what are the list of ES analytics that are
>> not available, out-of-the-box, in Solr?  Is there such a list?
>> 
>> Steve
>> 
>> On Thu, Mar 22, 2018 at 2:07 PM, Rahul Singh >> 
>> wrote:
>> 
>>> I have the same experience as Daphne. I’ve used SolR for more “document”
>> /
>>> “content” / “Knowledge” search and Elastic as a Log store or Mongo
>>> replacement. SolR has more ways to return/injest data such as XML, JSON,
>> or
>>> even CSV which is appealing. The binary protocol in SolrJ is also
>> appealing
>>> because the updates / selects are fast.
>>> 
>>> Ultimately I think SolR is like a 18 wheel tractor trailer and Elastic is
>>> like a uhaul trucks and you can chain a bunch of them up to do what SolR
>>> does.
>>> 
>>> --
>>> Rahul Singh
>>> rahul.si...@anant.us
>>> 
>>> Anant Corporation
>>> 
>>> On Mar 22, 2018, 9:04 AM -0500, Liu, Daphne <
>> daphne@cevalogistics.com>,
>>> wrote:
 I used Solr + Cassandra for Document search. Solr works very well with
>>> document indexing.
 For big data visualization, I use Elasticsearch + Grafana.
 As for today, Grafana is not supporting Solr.
 Elasticseach is very friendly and easy to use on multi-dimensional
>> Group
>>> by and its real-time query performance is very good.
 Grafana dashboard solution can be viewed @ https://grafana.com/
>>> dashboards/5204/edit
 
 
 Kind regards,
 
 Daphne Liu
 BI Architect Big Data - Matrix SCM
 
 CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
>>> 32256 USA / www.cevalogistics.com
 T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com
 
 Making business flow
 
 -Original Message-
 From: Steven White [mailto:swhite4...@gmail.com]
 Sent: Thursday, March 22, 2018 9:14 AM
 To: solr-user@lucene.apache.org
 Subject: Solr or Elasticsearch
 
 Hi everyone,
 
 There are some good write ups on the internet comparing the two and the
>>> one thing that keeps coming up about Elasticsearch being superior to Solr
>>> is it's analytic capability. However, I cannot find what those analytic
>>> capabilities are and why they cannot be done using Solr. Can someone help
>>> me with this question?
 
 Personally, I'm a Solr user and the thing that concerns me about
>>> Elasticsearch is the fact that it is owned by a company that can any day
>>> decide to stop making Elasticsearch avaialble under Apache license and
>> even
>>> completely close free access to it.
 
 So, this is a 2 part question:
 
 1) What are the analytic capability of Elasticsearch that cannot be
>> done
>>> using Solr? I want to see a complete list if possible.
 2) Should an Elasticsearch user be worried that Elasticsearch may close
>>> it's open-source policy at anytime or that outsiders have no say about
>> it's
>>> road map?
 
 Thanks,
 
 Steve
 
 NVOCC Services are provided by CEVA as agents for and on behalf of
>>> Pyramid Lines Limited trading as Pyramid Lines.
 This e-mail message is intended for the above named recipient(s) only.
>>> It may contain confidential information that is privileged. If you are
>> not
>>> the intended recipient, you are hereby notified that any dissemination,
>>> distribution or copying of this e-mail and any attachment(s) is strictly
>>> prohibited. If you have received this e-mail by error, please immediately
>>> notify the sender by replying to this e-mail and deleting the message
>>> including any 

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Thanks for the input.

I have got this to work by using cygwin.

Regards,
Edwin

On 23 March 2018 at 07:04, Chris Hostetter  wrote:

> :
> : Ah, there's the extra bit of context:
> : > PS C:\curl> .\curl '
> :
> : You're using Windows perhaps?  If so, it's probably a shell issue
> : getting all of the data to the "curl" command.
>
> Yep.. and you cna even see in the trace output that curl thinks the entire
> JSON payload you want to send is 24 bytes long, and ends with '"Joe'...
>
> : > 0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
> : > 00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
> : > => Send data, 24 bytes (0x18)
> : > : 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
> : > 0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
> : > == Info: upload completely sent off: 24 out of 24 bytes
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Solr Autoscaling multi-AZ rules

2018-03-22 Thread Noble Paul
The meaning of Replication Factor is screwed up. Replication factor is
a number. RF=3 means there are 3 replicas for each shard.

I understand that {"replica": "<7", "node":"#ANY"} may result in two
replicas of the same shard ending up on the same node. However, the
other rule should prevent this: {"replica": "<2", "shard": "#EACH",
"node": "#ANY"}
So by using both rules, that should mean "no more than six replicas on
a node, where all the replicas on that node represent distinct
shards". Right?

Yes you are right

On Fri, Feb 23, 2018 at 7:17 AM, Jeff Wartes  wrote:
>
> I managed to miss this reply earlier, but:
>
> Shard: A logical segment of a collection
> Replica: A physical core, representing a particular Shard
> Replication Factor (RF): A set of Replicas, such that a single Replica exists 
> for each Shard in a Collection.
> Availability Zone (AZ): A partitioned set of nodes such that a physical or 
> hardware failure in one AZ should not affect another AZ. AZ could mean 
> distinct racks in a data center, or distinct  data centers, but I happen to 
> specifically mean the AWS definition here: 
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones
>
> So an RF2 collection with 2 shards means I have four Replicas in my 
> collection, two shard1 and two shard2. If it's RF3, then I have six: three 
> shard1 and three shard2.
> I'm using "Distinct RF" as a shorthand for "a single replica for every shard 
> in the collection".
> In the RF2 example above, if I have two Availability Zones, I would want a 
> Distinct RF in each AZ. So, a replica for shard1 and shard2 in AZ1, and a 
> replica for shard1 and shard2 in AZ2. I would *not* want, say, both shard1 
> replicas in AZ1 because then a failure of AZ1 could leave me with no replicas 
> for shard1 and an incomplete collection.
> If I had RF6 and two AZs, I would want three Distinct RFs in each AZ. (three 
> replicas for each shard, per AZ)
>
> I understand that {"replica": "<7", "node":"#ANY"} may result in two replicas 
> of the same shard ending up on the same node. However, the other rule should 
> prevent this: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
> So by using both rules, that should mean "no more than six replicas on a 
> node, where all the replicas on that node represent distinct shards". Right?
>
>
>
> On 2/12/18, 12:18 PM, "Noble Paul"  wrote:
>
> >>Goal: No node should have more than 6 shards
>
> This is not possible today
>
>  {"replica": "<7", "node":"#ANY"} , means don't put more than 7
> replicas of the collection (irrespective of the shards) in a given
> node
>
> what do you mean by distinct 'RF' ? I think we are screwing up the
> terminologies a bit here
>
> On Wed, Feb 7, 2018 at 1:38 PM, Jeff Wartes  
> wrote:
> > I’ve been messing around with the Solr 7.2 autoscaling framework this 
> week. Some things seem trivial, but I’m also running into questions and 
> issues. If anyone else has experience with this stuff, I’d be glad to hear 
> it. Specifically:
> >
> >
> > Context:
> > -One collection, consisting of 42 shards, where up to 6 shards can fit 
> on a single node. (which means 7 nodes per Replication Factor)
> > -Three AZs, each with its own ip_2 value.
> >
> > Goals:
> >
> > Goal: Fully utilize available nodes.
> > Cluster Preference: {“maximize”: "cores”}
> >
> > Goal: No node should have more than one replica of a given shard
> > Rule: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
> >
> > Goal: No node should have more than 6 shards
> > Rule: {"replica": "<7", "node":"#ANY"}
> >
> > Goal: Where possible, distinct RFs should each exist in an AZ.
> > (Example1: I’d like 7 nodes with a complete RF in AZ 1 and 7 nodes with 
> a complete RF in AZ 2, and not end up with, say, both shard2 replicas in AZ 1)
> > (Example2: If I have 14 nodes in AZ 1 and 7 in AZ 2, I should have two 
> full RFs in AZ 1 and one in AZ 2)
> > Rule: ???
> >
> > I could have multiple non-strict rules perhaps? Like:
> > {"replica": "<2", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<3", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<4", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<2", "shard": "#EACH", "ip_2": "2", "strict":false}
> > {"replica": "<3", "shard": "#EACH", "ip_2": "2", "strict":false}
> > {"replica": "<4", "shard": "#EACH", "ip_2": "2", "strict":false}
> > etc
> > So having more than one RF in an AZ is a technical “violation”, but if 
> placement minimizes non-strict violations, replicas would tend to get placed 
> correctly.
> >
> >
> > Given a working set of rules, I’m still having trouble with two things:
> >
> >   1.  I’ve manually created the 

Re: Solr or Elasticsearch

2018-03-22 Thread Joel Bernstein
Solr 7.3 has very sophisticated math capabilities described below:

https://github.com/joel-bernstein/lucene-solr/blob/math_expressions_documentation/solr/solr-ref-guide/src/math-expressions.adoc

This is the userguide for math expressions which didn't make the 7.3
release but all the functions are committed and working for 7.3. It's only
the documentation that is lagging behind. About 30 new
functions are added with every release.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 22, 2018 at 4:28 PM, Steven White  wrote:

> Thank you all for your input.
>
> The one question still remains: what are the list of ES analytics that are
> not available, out-of-the-box, in Solr?  Is there such a list?
>
> Steve
>
> On Thu, Mar 22, 2018 at 2:07 PM, Rahul Singh  >
> wrote:
>
> > I have the same experience as Daphne. I’ve used SolR for more “document”
> /
> > “content” / “Knowledge” search and Elastic as a Log store or Mongo
> > replacement. SolR has more ways to return/injest data such as XML, JSON,
> or
> > even CSV which is appealing. The binary protocol in SolrJ is also
> appealing
> > because the updates / selects are fast.
> >
> > Ultimately I think SolR is like a 18 wheel tractor trailer and Elastic is
> > like a uhaul trucks and you can chain a bunch of them up to do what SolR
> > does.
> >
> > --
> > Rahul Singh
> > rahul.si...@anant.us
> >
> > Anant Corporation
> >
> > On Mar 22, 2018, 9:04 AM -0500, Liu, Daphne <
> daphne@cevalogistics.com>,
> > wrote:
> > > I used Solr + Cassandra for Document search. Solr works very well with
> > document indexing.
> > > For big data visualization, I use Elasticsearch + Grafana.
> > > As for today, Grafana is not supporting Solr.
> > > Elasticseach is very friendly and easy to use on multi-dimensional
> Group
> > by and its real-time query performance is very good.
> > > Grafana dashboard solution can be viewed @ https://grafana.com/
> > dashboards/5204/edit
> > >
> > >
> > > Kind regards,
> > >
> > > Daphne Liu
> > > BI Architect Big Data - Matrix SCM
> > >
> > > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> > 32256 USA / www.cevalogistics.com
> > > T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com
> > >
> > > Making business flow
> > >
> > > -Original Message-
> > > From: Steven White [mailto:swhite4...@gmail.com]
> > > Sent: Thursday, March 22, 2018 9:14 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Solr or Elasticsearch
> > >
> > > Hi everyone,
> > >
> > > There are some good write ups on the internet comparing the two and the
> > one thing that keeps coming up about Elasticsearch being superior to Solr
> > is it's analytic capability. However, I cannot find what those analytic
> > capabilities are and why they cannot be done using Solr. Can someone help
> > me with this question?
> > >
> > > Personally, I'm a Solr user and the thing that concerns me about
> > Elasticsearch is the fact that it is owned by a company that can any day
> > decide to stop making Elasticsearch avaialble under Apache license and
> even
> > completely close free access to it.
> > >
> > > So, this is a 2 part question:
> > >
> > > 1) What are the analytic capability of Elasticsearch that cannot be
> done
> > using Solr? I want to see a complete list if possible.
> > > 2) Should an Elasticsearch user be worried that Elasticsearch may close
> > it's open-source policy at anytime or that outsiders have no say about
> it's
> > road map?
> > >
> > > Thanks,
> > >
> > > Steve
> > >
> > > NVOCC Services are provided by CEVA as agents for and on behalf of
> > Pyramid Lines Limited trading as Pyramid Lines.
> > > This e-mail message is intended for the above named recipient(s) only.
> > It may contain confidential information that is privileged. If you are
> not
> > the intended recipient, you are hereby notified that any dissemination,
> > distribution or copying of this e-mail and any attachment(s) is strictly
> > prohibited. If you have received this e-mail by error, please immediately
> > notify the sender by replying to this e-mail and deleting the message
> > including any attachment(s) from your system. Thank you in advance for
> your
> > cooperation and assistance. Although the company has taken reasonable
> > precautions to ensure no viruses are present in this email, the company
> > cannot accept responsibility for any loss or damage arising from the use
> of
> > this email or attachments.
> >
>


Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Yes, I'm running this on Windows, using Windows Powershell "curl" command.

Will try out other tools like cygwin.

Thanks you.

Regards,
Edwin


On 23 March 2018 at 06:52, Yonik Seeley  wrote:

> Ah, there's the extra bit of context:
> > PS C:\curl> .\curl '
>
> You're using Windows perhaps?  If so, it's probably a shell issue
> getting all of the data to the "curl" command.
> Something like cygwin or WSL (Windows Subsystem for Linux) may make
> your life easier.
>
> -Yonik
>
> PS C:\curl> .\curl 'http://localhost:8983/solr/co
> llection1/update/json/docs?split=/|/orgs
> '
>  -H 'Content-type:application/j
> son' -d ' {   "id":"1",   "name_s": "Joe Smith",   "phone_s": 876876687,
>  "orgs": [ {   "name1_s": "Microsoft",
>"city_s": "Seattle",   "zip_s": 98052}, {   "name1_s":
> "Apple",   "city_s": "Cupertino",   "z
> ip_s": 95014}   ] }' --trace -
>
> == Info:   Trying ::1...
> == Info: TCP_NODELAY set
> == Info: Connected to localhost (::1) port 8983 (#0)
> => Send header, 172 bytes (0xac)
> : 50 4f 53 54 20 2f 65 64 6d 2f 65 6d 61 69 6c 73 POST /edm/emails
> 0010: 36 2f 75 70 64 61 74 65 2f 6a 73 6f 6e 2f 64 6f 6/update/json/do
> 0020: 63 73 3f 73 70 6c 69 74 3d 2f 7c 2f 6f 72 67 73 cs?split=/|/orgs
> 0030: 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a  HTTP/1.1..Host:
> 0040: 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 39 38 33 0d  localhost:8983.
> 0050: 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 .User-Agent: cur
> 0060: 6c 2f 37 2e 35 32 2e 31 0d 0a 41 63 63 65 70 74 l/7.52.1.
> .Accept
> 0070: 3a 20 2a 2f 2a 0d 0a 43 6f 6e 74 65 6e 74 2d 74 : */*..Content-t
> 0080: 79 70 65 3a 61 70 70 6c 69 63 61 74 69 6f 6e 2f ype:application/
> 0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
> 00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
> => Send data, 24 bytes (0x18)
> : 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
> 0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
> == Info: upload completely sent off: 24 out of 24 bytes
> <= Recv header, 26 bytes (0x1a)
> : 48 54 54 50 2f 31 2e 31 20 34 30 30 20 42 61 64 HTTP/1.1 400 Bad
> 0010: 20 52 65 71 75 65 73 74 0d 0aRequest..
> <= Recv header, 40 bytes (0x28)
> : 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te
> 0010: 78 74 2f 70 6c 61 69 6e 3b 63 68 61 72 73 65 74 xt/plain;charset
> 0020: 3d 75 74 66 2d 38 0d 0a =utf-8..
> <= Recv header, 21 bytes (0x15)
> : 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
> 0010: 33 32 33 0d 0a  323..
> <= Recv header, 2 bytes (0x2)
> : 0d 0a   ..
> <= Recv data, 323 bytes (0x143)
> : 7b 0a 20 20 22 72 65 73 70 6f 6e 73 65 48 65 61 {.  "responseHea
> 0010: 64 65 72 22 3a 7b 0a 20 20 20 20 22 73 74 61 74 der":{."stat
> 0020: 75 73 22 3a 34 30 30 2c 0a 20 20 20 20 22 51 54 us":400,."QT
> 0030: 69 6d 65 22 3a 30 7d 2c 0a 20 20 22 65 72 72 6f ime":0},.  "erro
> 0040: 72 22 3a 7b 0a 20 20 20 20 22 6d 65 74 61 64 61 r":{."metada
> 0050: 74 61 22 3a 5b 0a 20 20 20 20 20 20 22 65 72 72 ta":[.  "err
> 0060: 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e 61 or-class","org.a
> 0070: 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d 6f pache.solr.commo
> 0080: 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e 22 n.SolrException"
> 0090: 2c 0a 20 20 20 20 20 20 22 72 6f 6f 74 2d 65 72 ,.  "root-er
> 00a0: 72 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e ror-class","org.
> 00b0: 61 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d apache.solr.comm
> 00c0: 6f 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e on.SolrException
> 00d0: 22 5d 2c 0a 20 20 20 20 22 6d 73 67 22 3a 22 43 "],."msg":"C
> 00e0: 61 6e 6e 6f 74 20 70 61 72 73 65 20 70 72 6f 76 annot parse prov
> 00f0: 69 64 65 64 20 4a 53 4f 4e 3a 20 45 78 70 65 63 ided JSON: Expec
> 0100: 74 65 64 20 27 2c 27 20 6f 72 20 27 7d 27 3a 20 ted ',' or '}':
> 0110: 63 68 61 72 3d 28 45 4f 46 29 2c 70 6f 73 69 74 char=(EOF),posit
> 0120: 69 6f 6e 3d 32 34 20 41 46 54 45 52 3d 27 27 22 ion=24 AFTER=''"
> 0130: 2c 0a 20 20 20 20 22 63 6f 64 65 22 3a 34 30 30 ,."code":400
> 0140: 7d 7d 0a}}.
> {
>   "responseHeader":{
> "status":400,
> "QTime":0},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected ',' or '}':
> char=(EOF),position=24 AFTER=''",
> "code":400}}
> == Info: Curl_http_done: called premature == 0
> == Info: Connection #0 to host localhost left intact
> curl: (3) [globbing] bad range specification in column 39
>
>
> Regards,
> Edwin
>
>
>


Re: Error in indexing JSON with space in value

2018-03-22 Thread Chris Hostetter
: 
: Ah, there's the extra bit of context:
: > PS C:\curl> .\curl '
: 
: You're using Windows perhaps?  If so, it's probably a shell issue
: getting all of the data to the "curl" command.

Yep.. and you cna even see in the trace output that curl thinks the entire 
JSON payload you want to send is 24 bytes long, and ends with '"Joe'...

: > 0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
: > 00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
: > => Send data, 24 bytes (0x18)
: > : 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
: > 0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
: > == Info: upload completely sent off: 24 out of 24 bytes


-Hoss
http://www.lucidworks.com/


Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
Ah, there's the extra bit of context:
> PS C:\curl> .\curl '

You're using Windows perhaps?  If so, it's probably a shell issue
getting all of the data to the "curl" command.
Something like cygwin or WSL (Windows Subsystem for Linux) may make
your life easier.

-Yonik


On Thu, Mar 22, 2018 at 6:45 PM, Zheng Lin Edwin Yeo
 wrote:
> Thanks for your reply.
>
>
>
> PS C:\curl> .\curl '
> http://localhost:8983/edm/emails6/update/json/docs?split=/|/orgs' -H
> 'Content-type:application/j
> son' -d ' {   "id":"1",   "name_s": "Joe Smith",   "phone_s": 876876687,
>  "orgs": [ {   "name1_s": "Microsoft",
>"city_s": "Seattle",   "zip_s": 98052}, {   "name1_s":
> "Apple",   "city_s": "Cupertino",   "z
> ip_s": 95014}   ] }' --trace -
>
> == Info:   Trying ::1...
> == Info: TCP_NODELAY set
> == Info: Connected to localhost (::1) port 8983 (#0)
> => Send header, 172 bytes (0xac)
> : 50 4f 53 54 20 2f 65 64 6d 2f 65 6d 61 69 6c 73 POST /edm/emails
> 0010: 36 2f 75 70 64 61 74 65 2f 6a 73 6f 6e 2f 64 6f 6/update/json/do
> 0020: 63 73 3f 73 70 6c 69 74 3d 2f 7c 2f 6f 72 67 73 cs?split=/|/orgs
> 0030: 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a  HTTP/1.1..Host:
> 0040: 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 39 38 33 0d  localhost:8983.
> 0050: 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 .User-Agent: cur
> 0060: 6c 2f 37 2e 35 32 2e 31 0d 0a 41 63 63 65 70 74 l/7.52.1..Accept
> 0070: 3a 20 2a 2f 2a 0d 0a 43 6f 6e 74 65 6e 74 2d 74 : */*..Content-t
> 0080: 79 70 65 3a 61 70 70 6c 69 63 61 74 69 6f 6e 2f ype:application/
> 0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
> 00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
> => Send data, 24 bytes (0x18)
> : 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
> 0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
> == Info: upload completely sent off: 24 out of 24 bytes
> <= Recv header, 26 bytes (0x1a)
> : 48 54 54 50 2f 31 2e 31 20 34 30 30 20 42 61 64 HTTP/1.1 400 Bad
> 0010: 20 52 65 71 75 65 73 74 0d 0aRequest..
> <= Recv header, 40 bytes (0x28)
> : 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te
> 0010: 78 74 2f 70 6c 61 69 6e 3b 63 68 61 72 73 65 74 xt/plain;charset
> 0020: 3d 75 74 66 2d 38 0d 0a =utf-8..
> <= Recv header, 21 bytes (0x15)
> : 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
> 0010: 33 32 33 0d 0a  323..
> <= Recv header, 2 bytes (0x2)
> : 0d 0a   ..
> <= Recv data, 323 bytes (0x143)
> : 7b 0a 20 20 22 72 65 73 70 6f 6e 73 65 48 65 61 {.  "responseHea
> 0010: 64 65 72 22 3a 7b 0a 20 20 20 20 22 73 74 61 74 der":{."stat
> 0020: 75 73 22 3a 34 30 30 2c 0a 20 20 20 20 22 51 54 us":400,."QT
> 0030: 69 6d 65 22 3a 30 7d 2c 0a 20 20 22 65 72 72 6f ime":0},.  "erro
> 0040: 72 22 3a 7b 0a 20 20 20 20 22 6d 65 74 61 64 61 r":{."metada
> 0050: 74 61 22 3a 5b 0a 20 20 20 20 20 20 22 65 72 72 ta":[.  "err
> 0060: 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e 61 or-class","org.a
> 0070: 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d 6f pache.solr.commo
> 0080: 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e 22 n.SolrException"
> 0090: 2c 0a 20 20 20 20 20 20 22 72 6f 6f 74 2d 65 72 ,.  "root-er
> 00a0: 72 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e ror-class","org.
> 00b0: 61 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d apache.solr.comm
> 00c0: 6f 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e on.SolrException
> 00d0: 22 5d 2c 0a 20 20 20 20 22 6d 73 67 22 3a 22 43 "],."msg":"C
> 00e0: 61 6e 6e 6f 74 20 70 61 72 73 65 20 70 72 6f 76 annot parse prov
> 00f0: 69 64 65 64 20 4a 53 4f 4e 3a 20 45 78 70 65 63 ided JSON: Expec
> 0100: 74 65 64 20 27 2c 27 20 6f 72 20 27 7d 27 3a 20 ted ',' or '}':
> 0110: 63 68 61 72 3d 28 45 4f 46 29 2c 70 6f 73 69 74 char=(EOF),posit
> 0120: 69 6f 6e 3d 32 34 20 41 46 54 45 52 3d 27 27 22 ion=24 AFTER=''"
> 0130: 2c 0a 20 20 20 20 22 63 6f 64 65 22 3a 34 30 30 ,."code":400
> 0140: 7d 7d 0a}}.
> {
>   "responseHeader":{
> "status":400,
> "QTime":0},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected ',' or '}':
> char=(EOF),position=24 AFTER=''",
> "code":400}}
> == Info: Curl_http_done: called premature == 0
> == Info: Connection #0 to host localhost left intact
> curl: (3) [globbing] bad range specification in column 39
>
> Regards,
> Edwin
>
> On 23 March 2018 at 03:49, Yonik Seeley  wrote:
>
>> It looks like a curl globbing issue from the curl error message you
>> included:
>> "curl: (3) [globbing] bad range specification in column 39"
>>
>> You can try turning off curl globbing with 

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
 Thanks for your reply.

This is the curl command that I run, with the "--trace -" output.


PS C:\curl> .\curl 'http://localhost:8983/solr/collection1/update/json/docs?
split=/|/orgs
'
 -H 'Content-type:application/j
son' -d ' {   "id":"1",   "name_s": "Joe Smith",   "phone_s": 876876687,
 "orgs": [ {   "name1_s": "Microsoft",
   "city_s": "Seattle",   "zip_s": 98052}, {   "name1_s":
"Apple",   "city_s": "Cupertino",   "z
ip_s": 95014}   ] }' --trace -

== Info:   Trying ::1...
== Info: TCP_NODELAY set
== Info: Connected to localhost (::1) port 8983 (#0)
=> Send header, 172 bytes (0xac)
: 50 4f 53 54 20 2f 65 64 6d 2f 65 6d 61 69 6c 73 POST /edm/emails
0010: 36 2f 75 70 64 61 74 65 2f 6a 73 6f 6e 2f 64 6f 6/update/json/do
0020: 63 73 3f 73 70 6c 69 74 3d 2f 7c 2f 6f 72 67 73 cs?split=/|/orgs
0030: 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a  HTTP/1.1..Host:
0040: 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 39 38 33 0d  localhost:8983.
0050: 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 .User-Agent: cur
0060: 6c 2f 37 2e 35 32 2e 31 0d 0a 41 63 63 65 70 74 l/7.52.1.
.Accept
0070: 3a 20 2a 2f 2a 0d 0a 43 6f 6e 74 65 6e 74 2d 74 : */*..Content-t
0080: 79 70 65 3a 61 70 70 6c 69 63 61 74 69 6f 6e 2f ype:application/
0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
=> Send data, 24 bytes (0x18)
: 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
== Info: upload completely sent off: 24 out of 24 bytes
<= Recv header, 26 bytes (0x1a)
: 48 54 54 50 2f 31 2e 31 20 34 30 30 20 42 61 64 HTTP/1.1 400 Bad
0010: 20 52 65 71 75 65 73 74 0d 0aRequest..
<= Recv header, 40 bytes (0x28)
: 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te
0010: 78 74 2f 70 6c 61 69 6e 3b 63 68 61 72 73 65 74 xt/plain;charset
0020: 3d 75 74 66 2d 38 0d 0a =utf-8..
<= Recv header, 21 bytes (0x15)
: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
0010: 33 32 33 0d 0a  323..
<= Recv header, 2 bytes (0x2)
: 0d 0a   ..
<= Recv data, 323 bytes (0x143)
: 7b 0a 20 20 22 72 65 73 70 6f 6e 73 65 48 65 61 {.  "responseHea
0010: 64 65 72 22 3a 7b 0a 20 20 20 20 22 73 74 61 74 der":{."stat
0020: 75 73 22 3a 34 30 30 2c 0a 20 20 20 20 22 51 54 us":400,."QT
0030: 69 6d 65 22 3a 30 7d 2c 0a 20 20 22 65 72 72 6f ime":0},.  "erro
0040: 72 22 3a 7b 0a 20 20 20 20 22 6d 65 74 61 64 61 r":{."metada
0050: 74 61 22 3a 5b 0a 20 20 20 20 20 20 22 65 72 72 ta":[.  "err
0060: 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e 61 or-class","org.a
0070: 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d 6f pache.solr.commo
0080: 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e 22 n.SolrException"
0090: 2c 0a 20 20 20 20 20 20 22 72 6f 6f 74 2d 65 72 ,.  "root-er
00a0: 72 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e ror-class","org.
00b0: 61 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d apache.solr.comm
00c0: 6f 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e on.SolrException
00d0: 22 5d 2c 0a 20 20 20 20 22 6d 73 67 22 3a 22 43 "],."msg":"C
00e0: 61 6e 6e 6f 74 20 70 61 72 73 65 20 70 72 6f 76 annot parse prov
00f0: 69 64 65 64 20 4a 53 4f 4e 3a 20 45 78 70 65 63 ided JSON: Expec
0100: 74 65 64 20 27 2c 27 20 6f 72 20 27 7d 27 3a 20 ted ',' or '}':
0110: 63 68 61 72 3d 28 45 4f 46 29 2c 70 6f 73 69 74 char=(EOF),posit
0120: 69 6f 6e 3d 32 34 20 41 46 54 45 52 3d 27 27 22 ion=24 AFTER=''"
0130: 2c 0a 20 20 20 20 22 63 6f 64 65 22 3a 34 30 30 ,."code":400
0140: 7d 7d 0a}}.
{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected ',' or '}':
char=(EOF),position=24 AFTER=''",
"code":400}}
== Info: Curl_http_done: called premature == 0
== Info: Connection #0 to host localhost left intact
curl: (3) [globbing] bad range specification in column 39


Regards,
Edwin


On 23 March 2018 at 06:45, Zheng Lin Edwin Yeo  wrote:

> Thanks for your reply.
>
>
>
> PS C:\curl> .\curl 'http://localhost:8983/edm/emails6/update/json/docs?
> split=/|/orgs' -H 'Content-type:application/j
> son' -d ' {   "id":"1",   "name_s": "Joe Smith",   "phone_s": 876876687,
>  "orgs": [ {   "name1_s": "Microsoft",
>"city_s": "Seattle",   "zip_s": 98052}, {   "name1_s":
> "Apple",   "city_s": "Cupertino",   "z
> ip_s": 95014}   ] }' --trace -
>
> == Info:   Trying ::1...
> == Info: TCP_NODELAY set
> == Info: Connected to localhost (::1) port 8983 

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Thanks for your reply.



PS C:\curl> .\curl '
http://localhost:8983/edm/emails6/update/json/docs?split=/|/orgs' -H
'Content-type:application/j
son' -d ' {   "id":"1",   "name_s": "Joe Smith",   "phone_s": 876876687,
 "orgs": [ {   "name1_s": "Microsoft",
   "city_s": "Seattle",   "zip_s": 98052}, {   "name1_s":
"Apple",   "city_s": "Cupertino",   "z
ip_s": 95014}   ] }' --trace -

== Info:   Trying ::1...
== Info: TCP_NODELAY set
== Info: Connected to localhost (::1) port 8983 (#0)
=> Send header, 172 bytes (0xac)
: 50 4f 53 54 20 2f 65 64 6d 2f 65 6d 61 69 6c 73 POST /edm/emails
0010: 36 2f 75 70 64 61 74 65 2f 6a 73 6f 6e 2f 64 6f 6/update/json/do
0020: 63 73 3f 73 70 6c 69 74 3d 2f 7c 2f 6f 72 67 73 cs?split=/|/orgs
0030: 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a  HTTP/1.1..Host:
0040: 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 39 38 33 0d  localhost:8983.
0050: 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 .User-Agent: cur
0060: 6c 2f 37 2e 35 32 2e 31 0d 0a 41 63 63 65 70 74 l/7.52.1..Accept
0070: 3a 20 2a 2f 2a 0d 0a 43 6f 6e 74 65 6e 74 2d 74 : */*..Content-t
0080: 79 70 65 3a 61 70 70 6c 69 63 61 74 69 6f 6e 2f ype:application/
0090: 6a 73 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 json..Content-Le
00a0: 6e 67 74 68 3a 20 32 34 0d 0a 0d 0a ngth: 24
=> Send data, 24 bytes (0x18)
: 20 7b 20 20 20 69 64 3a 31 2c 20 20 20 6e 61 6d  {   id:1,   nam
0010: 65 5f 73 3a 20 4a 6f 65 e_s: Joe
== Info: upload completely sent off: 24 out of 24 bytes
<= Recv header, 26 bytes (0x1a)
: 48 54 54 50 2f 31 2e 31 20 34 30 30 20 42 61 64 HTTP/1.1 400 Bad
0010: 20 52 65 71 75 65 73 74 0d 0aRequest..
<= Recv header, 40 bytes (0x28)
: 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te
0010: 78 74 2f 70 6c 61 69 6e 3b 63 68 61 72 73 65 74 xt/plain;charset
0020: 3d 75 74 66 2d 38 0d 0a =utf-8..
<= Recv header, 21 bytes (0x15)
: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
0010: 33 32 33 0d 0a  323..
<= Recv header, 2 bytes (0x2)
: 0d 0a   ..
<= Recv data, 323 bytes (0x143)
: 7b 0a 20 20 22 72 65 73 70 6f 6e 73 65 48 65 61 {.  "responseHea
0010: 64 65 72 22 3a 7b 0a 20 20 20 20 22 73 74 61 74 der":{."stat
0020: 75 73 22 3a 34 30 30 2c 0a 20 20 20 20 22 51 54 us":400,."QT
0030: 69 6d 65 22 3a 30 7d 2c 0a 20 20 22 65 72 72 6f ime":0},.  "erro
0040: 72 22 3a 7b 0a 20 20 20 20 22 6d 65 74 61 64 61 r":{."metada
0050: 74 61 22 3a 5b 0a 20 20 20 20 20 20 22 65 72 72 ta":[.  "err
0060: 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e 61 or-class","org.a
0070: 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d 6f pache.solr.commo
0080: 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e 22 n.SolrException"
0090: 2c 0a 20 20 20 20 20 20 22 72 6f 6f 74 2d 65 72 ,.  "root-er
00a0: 72 6f 72 2d 63 6c 61 73 73 22 2c 22 6f 72 67 2e ror-class","org.
00b0: 61 70 61 63 68 65 2e 73 6f 6c 72 2e 63 6f 6d 6d apache.solr.comm
00c0: 6f 6e 2e 53 6f 6c 72 45 78 63 65 70 74 69 6f 6e on.SolrException
00d0: 22 5d 2c 0a 20 20 20 20 22 6d 73 67 22 3a 22 43 "],."msg":"C
00e0: 61 6e 6e 6f 74 20 70 61 72 73 65 20 70 72 6f 76 annot parse prov
00f0: 69 64 65 64 20 4a 53 4f 4e 3a 20 45 78 70 65 63 ided JSON: Expec
0100: 74 65 64 20 27 2c 27 20 6f 72 20 27 7d 27 3a 20 ted ',' or '}':
0110: 63 68 61 72 3d 28 45 4f 46 29 2c 70 6f 73 69 74 char=(EOF),posit
0120: 69 6f 6e 3d 32 34 20 41 46 54 45 52 3d 27 27 22 ion=24 AFTER=''"
0130: 2c 0a 20 20 20 20 22 63 6f 64 65 22 3a 34 30 30 ,."code":400
0140: 7d 7d 0a}}.
{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected ',' or '}':
char=(EOF),position=24 AFTER=''",
"code":400}}
== Info: Curl_http_done: called premature == 0
== Info: Connection #0 to host localhost left intact
curl: (3) [globbing] bad range specification in column 39

Regards,
Edwin

On 23 March 2018 at 03:49, Yonik Seeley  wrote:

> It looks like a curl globbing issue from the curl error message you
> included:
> "curl: (3) [globbing] bad range specification in column 39"
>
> You can try turning off curl globbing with the -g param.
> That may not be the only issue though, as the command shown shouldn't
> have triggered curl globbing.  Perhaps you simplified it or redacted
> some info before posting?
>
> -Yonik
>


Re: Legacy replication slave node full sync

2018-03-22 Thread Erick Erickson
1a> Replication pulls down changed segments, which includes _changed_
segments. Say I have 10 segments in my index and they all get merged
into a single segment that now contains the entire index. Then the
changed segment is replicated.

1b> If you're polling interval is such that all the segments get
replaced between synchronizations, then the entire index will be fetch
at the next poll.

2> You can turn on infoStream logging, see the reference guide.
WARNING: this will produce a _lot_ of output.

Why is this important? It's expected that occasionally a replication
may pull down the entire index, doing a forceMerge on the master for
instance. If you're saying that occasionally replication _replaces
existing segments_ with fresh ones of the same name from the master,
then that's a mystery. If you're saying that occasionally all the
segments are pulled from the master and all the old segments are
deleted from the slave, then that's expected, in this case there will
be no segments in common.

Best,
Erick

On Thu, Mar 22, 2018 at 11:10 AM, Yunee Lee  wrote:
> Hi,
> I have two questions regarding legacy master /slave node replication 
> architecture.
> We noticed that slave node does full sync time to time.
>
>   1.  What type of  event or configuration does trigger the full sync in 
> slave node?
> I can not locate exact time and frequency from the logs. Please let me know.
>   2.  If  master nodes’  index merges  is related to trigger the full sync 
> replication,  then how can I find the index merge logging in solr log from 
> master node?
> Please share the documentation if I can reference.
> Thanks.
>


Re: Solr or Elasticsearch

2018-03-22 Thread Steven White
Thank you all for your input.

The one question still remains: what are the list of ES analytics that are
not available, out-of-the-box, in Solr?  Is there such a list?

Steve

On Thu, Mar 22, 2018 at 2:07 PM, Rahul Singh 
wrote:

> I have the same experience as Daphne. I’ve used SolR for more “document” /
> “content” / “Knowledge” search and Elastic as a Log store or Mongo
> replacement. SolR has more ways to return/injest data such as XML, JSON, or
> even CSV which is appealing. The binary protocol in SolrJ is also appealing
> because the updates / selects are fast.
>
> Ultimately I think SolR is like a 18 wheel tractor trailer and Elastic is
> like a uhaul trucks and you can chain a bunch of them up to do what SolR
> does.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 22, 2018, 9:04 AM -0500, Liu, Daphne ,
> wrote:
> > I used Solr + Cassandra for Document search. Solr works very well with
> document indexing.
> > For big data visualization, I use Elasticsearch + Grafana.
> > As for today, Grafana is not supporting Solr.
> > Elasticseach is very friendly and easy to use on multi-dimensional Group
> by and its real-time query performance is very good.
> > Grafana dashboard solution can be viewed @ https://grafana.com/
> dashboards/5204/edit
> >
> >
> > Kind regards,
> >
> > Daphne Liu
> > BI Architect Big Data - Matrix SCM
> >
> > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> 32256 USA / www.cevalogistics.com
> > T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com
> >
> > Making business flow
> >
> > -Original Message-
> > From: Steven White [mailto:swhite4...@gmail.com]
> > Sent: Thursday, March 22, 2018 9:14 AM
> > To: solr-user@lucene.apache.org
> > Subject: Solr or Elasticsearch
> >
> > Hi everyone,
> >
> > There are some good write ups on the internet comparing the two and the
> one thing that keeps coming up about Elasticsearch being superior to Solr
> is it's analytic capability. However, I cannot find what those analytic
> capabilities are and why they cannot be done using Solr. Can someone help
> me with this question?
> >
> > Personally, I'm a Solr user and the thing that concerns me about
> Elasticsearch is the fact that it is owned by a company that can any day
> decide to stop making Elasticsearch avaialble under Apache license and even
> completely close free access to it.
> >
> > So, this is a 2 part question:
> >
> > 1) What are the analytic capability of Elasticsearch that cannot be done
> using Solr? I want to see a complete list if possible.
> > 2) Should an Elasticsearch user be worried that Elasticsearch may close
> it's open-source policy at anytime or that outsiders have no say about it's
> road map?
> >
> > Thanks,
> >
> > Steve
> >
> > NVOCC Services are provided by CEVA as agents for and on behalf of
> Pyramid Lines Limited trading as Pyramid Lines.
> > This e-mail message is intended for the above named recipient(s) only.
> It may contain confidential information that is privileged. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this e-mail and any attachment(s) is strictly
> prohibited. If you have received this e-mail by error, please immediately
> notify the sender by replying to this e-mail and deleting the message
> including any attachment(s) from your system. Thank you in advance for your
> cooperation and assistance. Although the company has taken reasonable
> precautions to ensure no viruses are present in this email, the company
> cannot accept responsibility for any loss or damage arising from the use of
> this email or attachments.
>


Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
It looks like a curl globbing issue from the curl error message you included:
"curl: (3) [globbing] bad range specification in column 39"

You can try turning off curl globbing with the -g param.
That may not be the only issue though, as the command shown shouldn't
have triggered curl globbing.  Perhaps you simplified it or redacted
some info before posting?

-Yonik


Legacy replication slave node full sync

2018-03-22 Thread Yunee Lee
Hi,
I have two questions regarding legacy master /slave node replication 
architecture.
We noticed that slave node does full sync time to time.

  1.  What type of  event or configuration does trigger the full sync in slave 
node?
I can not locate exact time and frequency from the logs. Please let me know.
  2.  If  master nodes’  index merges  is related to trigger the full sync 
replication,  then how can I find the index merge logging in solr log from 
master node?
Please share the documentation if I can reference.
Thanks.



RE: Solr or Elasticsearch

2018-03-22 Thread Rahul Singh
I have the same experience as Daphne. I’ve used SolR for more “document” / 
“content” / “Knowledge” search and Elastic as a Log store or Mongo replacement. 
SolR has more ways to return/injest data such as XML, JSON, or even CSV which 
is appealing. The binary protocol in SolrJ is also appealing because the 
updates / selects are fast.

Ultimately I think SolR is like a 18 wheel tractor trailer and Elastic is like 
a uhaul trucks and you can chain a bunch of them up to do what SolR does.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 22, 2018, 9:04 AM -0500, Liu, Daphne , 
wrote:
> I used Solr + Cassandra for Document search. Solr works very well with 
> document indexing.
> For big data visualization, I use Elasticsearch + Grafana.
> As for today, Grafana is not supporting Solr.
> Elasticseach is very friendly and easy to use on multi-dimensional Group by 
> and its real-time query performance is very good.
> Grafana dashboard solution can be viewed @ 
> https://grafana.com/dashboards/5204/edit
>
>
> Kind regards,
>
> Daphne Liu
> BI Architect Big Data - Matrix SCM
>
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
> USA / www.cevalogistics.com
> T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com
>
> Making business flow
>
> -Original Message-
> From: Steven White [mailto:swhite4...@gmail.com]
> Sent: Thursday, March 22, 2018 9:14 AM
> To: solr-user@lucene.apache.org
> Subject: Solr or Elasticsearch
>
> Hi everyone,
>
> There are some good write ups on the internet comparing the two and the one 
> thing that keeps coming up about Elasticsearch being superior to Solr is it's 
> analytic capability. However, I cannot find what those analytic capabilities 
> are and why they cannot be done using Solr. Can someone help me with this 
> question?
>
> Personally, I'm a Solr user and the thing that concerns me about 
> Elasticsearch is the fact that it is owned by a company that can any day 
> decide to stop making Elasticsearch avaialble under Apache license and even 
> completely close free access to it.
>
> So, this is a 2 part question:
>
> 1) What are the analytic capability of Elasticsearch that cannot be done 
> using Solr? I want to see a complete list if possible.
> 2) Should an Elasticsearch user be worried that Elasticsearch may close it's 
> open-source policy at anytime or that outsiders have no say about it's road 
> map?
>
> Thanks,
>
> Steve
>
> NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
> Lines Limited trading as Pyramid Lines.
> This e-mail message is intended for the above named recipient(s) only. It may 
> contain confidential information that is privileged. If you are not the 
> intended recipient, you are hereby notified that any dissemination, 
> distribution or copying of this e-mail and any attachment(s) is strictly 
> prohibited. If you have received this e-mail by error, please immediately 
> notify the sender by replying to this e-mail and deleting the message 
> including any attachment(s) from your system. Thank you in advance for your 
> cooperation and assistance. Although the company has taken reasonable 
> precautions to ensure no viruses are present in this email, the company 
> cannot accept responsibility for any loss or damage arising from the use of 
> this email or attachments.


Re: Error in indexing JSON with space in value

2018-03-22 Thread Shawn Heisey
On 3/22/2018 9:48 AM, Zheng Lin Edwin Yeo wrote:
> I am trying to index the following JSON, in which there is a space in the
> name "Joe Smith".
>
> .\curl 'http://localhost:8983/solr/collection/update/json/docs?split=/|/orgs
> '
> -H 'Content-type:application/json' -d '
> {
>   "id":"1",
>   "name_s": "Joe Smith",
>   "phone_s": 876876687,
>   "orgs": [
> {
>   "name1_s" : "Microsoft",
>   "city_s" : "Seattle",
>   "zip_s" : 98052},
> {
>   "name1_s" : "Apple",
>   "city_s" : "Cupertino",
>   "zip_s" : 95014}
>   ]
> }'
>
> However, I get the following error during the indexing.
>
> {
>   "responseHeader":{
> "status":400,
> "QTime":1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected ',' or '}':
> char=(EOF),position=24 AFTER=''",
> "code":400}}
> curl: (3) [globbing] bad range specification in column 39

The error here is complaining about the format of the JSON itself, not
the content.  That space should not be causing ANY problems.

I suspect that there's something bad about the curl command you're
doing, or maybe a bug in the system you're trying it on, but I cannot be
sure.  If we could see a graphical screenshot of the window/screen where
you're entering the curl command, that might be helpful.

By changing ONLY field names (because my index doesn't have any of the
field names you're using), adding a required field for my index to each
section, and making sure I did everything right on the curl command, I
was able to get this to work with JSON that's substantially similar to
yours, including an unchanged "Joe Smith" value:

===
[root@westgate ~]# curl
'http://bigindy5.REDACTED.com:8982/solr/s0build/update/json/docs?split=/|/orgs'
-H 'Content-type:application/json' -d '
{
  "tag_id":"1",
  "keywords": "Joe Smith",
  "did": 876876687,
  "orgs": [
    {
  "tag_id" : "2",
  "did": 176876687,
  "mood" : "Microsoft",
  "special_cats" : "Seattle",
  "byline" : 98052},
    {
  "tag_id" : "3",
  "did": 276876687,
  "mood" : "Apple",
  "special_cats" : "Cupertino",
  "byline" : 95014}
  ]
}'
{"responseHeader":{"status":0,"QTime":6}}
===

This request indexed 3 docs.

The server that I'm sending to is running version 6.6.2.

Thanks,
Shawn



Re: Error in indexing JSON with space in value

2018-03-22 Thread Chris Hostetter


I can't reproduce the problem you described -- using 7.2.1 and the 
techproducts example i can index a JSON string w/white space just fine...

$ bin/solr -e techproducts
$ curl 'http://localhost:8983/solr/techproducts/update/json/docs' -H 
'Content-type:application/json' -d '
{
  "id":"1",
  "name_s": "Joe Smith",
  "phone_s": 876876687}'
{
  "responseHeader":{
"status":0,
"QTime":5}}

Your specific curl command had some oddities in it (".\" before "curl";
"-H" on a new line w/o "\" escaping the newline) that may have just been
artifacts of copy/paste into email...

But even when trying to run the command as i think you intended it, i did 
not get a JSON parsing error regarding space -- just a nested doc error 
because your "split" param wasn't compatible with the configured default 
"srcField" option in the techproducts configset...

$ curl 
'http://localhost:8983/solr/techproducts/update/json/docs?srcField==/|/orgs'
 -H 'Content-type:application/json' -d '
{
  "id":"1",
  "name_s": "Joe Smith",
  "phone_s": 876876687,
  "orgs": [
{
  "name1_s" : "Microsoft",
  "city_s" : "Seattle",
  "zip_s" : 98052},
{
  "name1_s" : "Apple",
  "city_s" : "Cupertino",
  "zip_s" : 95014}
  ]
}'
{
  "responseHeader":{
"status":400,
"QTime":2},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Raw data can be stored only if split=/",
"code":400}}




Are you *certain* it was a plain old space character, and that you didn't 
somehow get an EOF character or NUL byte in there some how?

Can you try running your curl command with the '--trace -' option and send 
us the full output?





: Date: Thu, 22 Mar 2018 23:48:21 +0800
: From: Zheng Lin Edwin Yeo 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Error in indexing JSON with space in value
: 
: Hi,
: 
: I am trying to index the following JSON, in which there is a space in the
: name "Joe Smith".
: 
: .\curl 'http://localhost:8983/solr/collection/update/json/docs?split=/|/orgs
: '
: -H 'Content-type:application/json' -d '
: {
:   "id":"1",
:   "name_s": "Joe Smith",
:   "phone_s": 876876687,
:   "orgs": [
: {
:   "name1_s" : "Microsoft",
:   "city_s" : "Seattle",
:   "zip_s" : 98052},
: {
:   "name1_s" : "Apple",
:   "city_s" : "Cupertino",
:   "zip_s" : 95014}
:   ]
: }'
: 
: However, I get the following error during the indexing.
: 
: {
:   "responseHeader":{
: "status":400,
: "QTime":1},
:   "error":{
: "metadata":[
:   "error-class","org.apache.solr.common.SolrException",
:   "root-error-class","org.apache.solr.common.SolrException"],
: "msg":"Cannot parse provided JSON: Expected ',' or '}':
: char=(EOF),position=24 AFTER=''",
: "code":400}}
: curl: (3) [globbing] bad range specification in column 39
: 
: 
: If I remove the space in "Joe Smith" to make it "JoeSmith", then the
: indexing is successful. What can we do if we want to keep the space in the
: name? Do we need to include some escape characters or something?
: 
: I'm using Solr 7.2.1.
: 
: Regards,
: Edwin
: 

-Hoss
http://www.lucidworks.com/


Re: LTR not able to upload org.apache.solr.ltr.model.MultipleAdditiveTreesModel

2018-03-22 Thread Roopa Rao
Here is the stacktrace

Caused by: org.apache.solr.common.SolrException: Failed to create new
ManagedResource /schema/model-store of type
org.apache.solr.ltr.store.rest.ManagedModelStore due to:
org.apache.solr.common.SolrException:
org.apache.solr.ltr.model.ModelException: Model type does not exist
org.apache.solr.ltr.model.MultipleAdditiveTreesModel
at org.apache.solr.core.SolrCore.(SolrCore.java:965)
at org.apache.solr.core.SolrCore.reload(SolrCore.java:641)
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1107)
... 37 more
Caused by: org.apache.solr.common.SolrException: Failed to create new
ManagedResource /schema/model-store of type
org.apache.solr.ltr.store.rest.ManagedModelStore due to:
org.apache.solr.common.SolrException:
org.apache.solr.ltr.model.ModelException: Model type does not exist
org.apache.solr.ltr.model.MultipleAdditiveTreesModel
at
org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:700)
  a
at
org.apache.solr.rest.RestManager.addRegisteredResource(RestManager.java:666)
at org.apache.solr.rest.RestManager.access$300(RestManager.java:59)
at
org.apache.solr.rest.RestManager$Registry.registerManagedResource(RestManager.java:231)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.registerManagedModelStore(ManagedModelStore.java:52)
at
org.apache.solr.ltr.search.LTRQParserPlugin.inform(LTRQParserPlugin.java:119)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:719)
at org.apache.solr.core.SolrCore.(SolrCore.java:944)
... 39 more
Caused by: org.apache.solr.common.SolrException:
org.apache.solr.ltr.model.ModelException: Model type does not exist
org.apache.solr.ltr.model.MultipleAdditiveTreesModel
at
org.apache.solr.ltr.store.rest.ManagedModelStore.addModelFromMap(ManagedModelStore.java:137)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.loadStoredModels(ManagedModelStore.java:127)
at
org.apache.solr.ltr.search.LTRQParserPlugin.onManagedResourceInitialized(LTRQParserPlugin.java:133)
at
org.apache.solr.rest.ManagedResource.notifyObserversDuringInit(ManagedResource.java:115)
at
org.apache.solr.rest.ManagedResource.loadManagedDataAndNotify(ManagedResource.java:91)
at
org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:694)
... 46 more
Caused by: org.apache.solr.ltr.model.ModelException: Model type does not
exist org.apache.solr.ltr.model.MultipleAdditiveTreesModel
at
org.apache.solr.ltr.model.LTRScoringModel.getInstance(LTRScoringModel.java:103)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.fromLTRScoringModelMap(ManagedModelStore.java:235)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.initWrapperModel(ManagedModelStore.java:254)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.fromLTRScoringModelMap(ManagedModelStore.java:245)
at
org.apache.solr.ltr.store.rest.ManagedModelStore.addModelFromMap(ManagedModelStore.java:134)
... 51 more
Caused by: org.apache.solr.common.SolrException: Error instantiating class:
'org.apache.solr.ltr.model.MultipleAdditiveTreesModel'
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:655)
at
org.apache.solr.ltr.model.LTRScoringModel.getInstance(LTRScoringModel.java:93)
... 55 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:637)
... 56 more
Caused by: java.lang.NullPointerException
at
org.apache.solr.ltr.model.MultipleAdditiveTreesModel.(MultipleAdditiveTreesModel.java:308)
... 61 more

On Wed, Mar 14, 2018 at 10:32 AM, Alessandro Benedetti  wrote:

> This is the piece of code involved :
>
> "try {
>   // create an instance of the model
>   model = solrResourceLoader.newInstance(
>   className,
>   LTRScoringModel.class,
>   new String[0], // no sub packages
>   new Class[] { String.class, List.class, List.class, String.class,
> List.class, Map.class },
>   new Object[] { name, features, norms, featureStoreName,
> allFeatures, params });
>   if (params != null) {
> SolrPluginUtils.invokeSetters(model, params.entrySet());
>   }
> } catch (final Exception e) {
>   throw new ModelException("Model type does not exist " + className,
> e);
> }"
>
> I admit it is generic and contains even a catch "Exception" clause, but
> wasn't it logging the stacktrace ?
> Just out of curiosity, how was the entire stacktrace ?
>
> This may help to improve it.
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: 

InetAddressPoint support in Solr or other IP type?

2018-03-22 Thread Mike Cooper
I have scoured the web and cannot find any discussion of having the Lucene
InetAddressPoint type exposed in Solr. Is there a reason this is omitted
from the Solr supported types? Is it on the roadmap? Is there an alternative
recommended way to index and store Ipv4 and Ipv6 addresses for optimal range
searches and subnet searches? Thanks for your help.

 

Michael Cooper



smime.p7s
Description: S/MIME cryptographic signature


Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Hi,

I am trying to index the following JSON, in which there is a space in the
name "Joe Smith".

.\curl 'http://localhost:8983/solr/collection/update/json/docs?split=/|/orgs
'
-H 'Content-type:application/json' -d '
{
  "id":"1",
  "name_s": "Joe Smith",
  "phone_s": 876876687,
  "orgs": [
{
  "name1_s" : "Microsoft",
  "city_s" : "Seattle",
  "zip_s" : 98052},
{
  "name1_s" : "Apple",
  "city_s" : "Cupertino",
  "zip_s" : 95014}
  ]
}'

However, I get the following error during the indexing.

{
  "responseHeader":{
"status":400,
"QTime":1},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected ',' or '}':
char=(EOF),position=24 AFTER=''",
"code":400}}
curl: (3) [globbing] bad range specification in column 39


If I remove the space in "Joe Smith" to make it "JoeSmith", then the
indexing is successful. What can we do if we want to keep the space in the
name? Do we need to include some escape characters or something?

I'm using Solr 7.2.1.

Regards,
Edwin


Re: Indexing multi level Nested JSON

2018-03-22 Thread Zheng Lin Edwin Yeo
I'm trying to index the following JSON with 2 child level using the
following curl command:

.\curl '
http://localhost:8983/solr/collection1/update/json/docs?split=/|/orgs'
-H 'Content-type:application/json' -d '
{
  "id":"1",
  "name_s": "JoeSmith",
  "phone_s": 876876687,
  "orgs": [
{
  "name1_s" : "Microsoft",
  "city_s" : "Seattle",
  "zip_s" : 98052,

"orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":"edwin","phone2_ss":"456"}]
},
{
  "name1_s" : "Apple",
  "city_s" : "Cupertino",
  "zip_s" : 95014,

"orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":"edwin","phone2_ss":"456"}]
}
  ]
}'

However, after indexing, this is what is shown in Solr. The 2nd child have
been place together under the 1st child as a multi-valued field, which is
wrong

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":41,
"params":{
  "q":"phone_s:876876687",
  "fl":"*,[child parentFilter=phone_s:876876687]",
  "sort":"id asc"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"1",
"name_s":"JoeSmith",
"phone_s":"876876687",
"language_s":"en",
"_version_":1595632041779527680,
"_childDocuments_":[
{
  "name1_s":"Microsoft",
  "city_s":"Seattle",
  "zip_s":"98052",
  "orgs.name2_ss":["alan",
"edwin"],
  "orgs.phone2_ss":["123",
"456"],
  "_version_":1595632041779527680},
{
  "name1_s":"Apple",
  "city_s":"Cupertino",
  "zip_s":"95014",
  "orgs.name2_ss":["alan",
"edwin"],
  "orgs.phone2_ss":["123",
"456"],
  "_version_":1595632041779527680}]}]
  }}


How can we structure the curl command so it will be able to accept child of
child relationship? We should not be doing any pre-processing to the JSON
to achieve that.

Regards,
Edwin


On 20 March 2018 at 16:44, Zheng Lin Edwin Yeo  wrote:

> Hi Mikhail,
>
> Thanks for your reply.
> Meaning the only way to identify them is to add in the fields, like Eg:
> contentType during indexing?
>
> Regards,
> Edwin
>
> On 20 March 2018 at 16:34, Mikhail Khludnev  wrote:
>
>> Edwin,
>> You need to add necessary fields into child/grands to keep multiple levels
>> and reconstruct them in result post processing.
>> There is nothing ready-made for it.
>>
>>
>> On Tue, Mar 20, 2018 at 7:02 AM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> wrote:
>>
>> > I have found that we can index multi level Nested JSON with child of
>> child
>> > relationship.
>> >
>> > However, how can we identify it from the output that it is the child of
>> > child relationship? From what I have see, all the line results are tied
>> and
>> > pointed to the parents, so it seems that all are the parent-child
>> > relationship, and I can't identify which are the child of child
>> > relationship.
>> >
>> > Regards,
>> > Edwin
>> >
>> > On 19 March 2018 at 11:16, Zheng Lin Edwin Yeo 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I have this sample multi level Nested JSON, with 2 level of child
>> > > Documents.
>> > >
>> > > [
>> > >   {
>> > > "id": "1",
>> > > "title_s": "Solr adds block join support",
>> > > "contenttype_s": "parentDocument",
>> > > "_childDocuments_": [
>> > >   {
>> > > "id": "3",
>> > > "comments_s": "SolrCloud supports it too!",
>> > > "_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"
>> > > name_s":"edwin","phone_s":"456"}]
>> > >   },
>> > >   {
>> > > "id": "3a",
>> > > "comments_s": "SolrCloud supports it too 2!",
>> > > "_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"
>> > > name_s":"edwin","phone_s":"456"}]
>> > >   }
>> > > ]
>> > >   },
>> > >   {
>> > > "id": "2",
>> > > "title_s": "New Lucene and Solr release is out",
>> > > "contenttype_s": "parentDocument",
>> > > "_childDocuments_": [
>> > >   {
>> > > "id": "4",
>> > > "comments_s": "Lots of new features",
>> > > "_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"
>> > > name_s":"edwin","phone_s":"456"}]
>> > >   }
>> > > ]
>> > >   },
>> > >   {
>> > > "id": "5",
>> > > "title_s": "Testing of Nested JSON",
>> > > "contenttype_s": "parentDocument",
>> > > "_childDocuments_": [
>> > >   {
>> > > "id": "6",
>> > > "comments_s": "See if this is a child",
>> > > "_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"
>> > > name_s":"edwin","phone_s":"456"}]
>> > >   }
>> > > ]
>> > >   }
>> > > ]
>> > >
>> > >
>> > > However, when it is indexed into Solr, there is only one level, and
>> the
>> > > output becomes like this.
>> > >
>> > > {
>> > >   "responseHeader":{
>> > > "zkConnected":true,
>> > > "status":0,
>> > > "QTime":1,
>> > > "params":{
>> > >   

Re: querying vs. highlighting: complete freedom?

2018-03-22 Thread Erick Erickson
Basically you need to use a copyField, but in several variants:

If you use the field _exclusively_ for highlighting then store the raw
content there and have the field use whatever analyzer you want. You
do _not_ need to have indexed="true" set for the field if you're
highlighting on the fly. So you're searching against field1 (which has
indexed="true" stored="false" set) but highlighting against field2
(which has indexed="false" stored="true" set). Of course any time you
want to return the contents in a doc your fl needs to specify
field2...

The above does not bloat your index at all since the cost of
stored="true" indexed="true" is the same as if you use two fields,
each with only one option turned on.

The second approach if you want to use FastVectorHighlighter or the
like is simply to index both fields.

Best,
Erick

On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika  wrote:
> Hi Solr-Users,
>
> I've been playing with a german collection of documents, where I tried to
> search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
> this a "legal" use case? My key question is how can I tell solr which query
> analyzer to use for highlighting? Strictly speaking, I should use
> hl.q=Kündigung to conceptually look for relevant information, but in this
> case, no highlighting is returned (as all umlauts are left out in the
> index) .
>
> Additional infos:
>
> solr version: 7.2
> urls to query:
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=K%C3%BCndigung=3=xml=1
> 
>
> Managed-schema:
>
>   
> 
>   
>   
>words="lang/stopwords_de.txt" ignoreCase="true"/>
>   
>   
> 
>   
>
>
> Other additional infos:
> https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
>
> Cheers,
> Arturas


RE: Solr or Elasticsearch

2018-03-22 Thread Liu, Daphne
I used Solr + Cassandra for Document search. Solr works very well with document 
indexing.
For big data visualization, I use Elasticsearch + Grafana.
As for today, Grafana is not supporting Solr.
Elasticseach is very friendly and easy to use on multi-dimensional Group by and 
its real-time query performance is very good.
Grafana dashboard solution can be viewed @ 
https://grafana.com/dashboards/5204/edit


Kind regards,

Daphne Liu
BI Architect Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com

Making business flow

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Thursday, March 22, 2018 9:14 AM
To: solr-user@lucene.apache.org
Subject: Solr or Elasticsearch

Hi everyone,

There are some good write ups on the internet comparing the two and the one 
thing that keeps coming up about Elasticsearch being superior to Solr is it's 
analytic capability.  However, I cannot find what those analytic capabilities 
are and why they cannot be done using Solr.  Can someone help me with this 
question?

Personally, I'm a Solr user and the thing that concerns me about Elasticsearch 
is the fact that it is owned by a company that can  any day decide to stop 
making Elasticsearch avaialble under Apache license and even completely close 
free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done using 
Solr?  I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close it's 
open-source policy at anytime or that outsiders have no say about it's road map?

Thanks,

Steve

NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.


Re: Solr or Elasticsearch

2018-03-22 Thread Vincenzo D'Amore
Hi Steve,

this seems to be more recent

https://sematext.com/blog/solr-vs-elasticsearch-differences/

On Thu, Mar 22, 2018 at 2:33 PM, Charlie Hull  wrote:

> On 22/03/2018 13:13, Steven White wrote:
>
>> Hi everyone,
>>
>> There are some good write ups on the internet comparing the two and the
>> one
>> thing that keeps coming up about Elasticsearch being superior to Solr is
>> it's analytic capability.  However, I cannot find what those analytic
>> capabilities are and why they cannot be done using Solr.  Can someone help
>> me with this question?
>>
>
> Hi Steve,
>
> As you've said there are lots of writeups, some more out-of-date than
> others. http://solr-vs-elasticsearch.com/ is quite good on features.
>
> The analytics in ES are based on a number of custom aggregations (which I
> always think of as facet-counting-on-steroids, but I realise it's more
> complicated than that). Here's an early doc on them
> https://www.elastic.co/guide/en/elasticsearch/guide/current/
> _analytics.html So you need a good grasp of Elasticsearch's DSL to use
> these. The integration with Kibana is good if you want to display your
> results.
>
> Solr's analytic capabilities use a Solr Search Component:
> https://lucene.apache.org/solr/guide/7_2/analytics.html . As with a lot
> of Solr features these can appear a lot more complex than Elasticsearch's
> offering. Yonik's blog is also worth reading as he often talks about new
> and upcoming Solr features like this. http://yonik.com/solr-facet-fu
> nctions/
>
> As we've always said, there are few cases where you can't build a solution
> using either engine and I believe that's also true for analytics.
>
>>
>> Personally, I'm a Solr user and the thing that concerns me about
>> Elasticsearch is the fact that it is owned by a company that can  any day
>> decide to stop making Elasticsearch avaialble under Apache license and
>> even
>> completely close free access to it.
>>
>
> Yes, but why would they? It would be suicide for a company that have such
> an established open source heritage - not least because a lot of Lucene
> developers who work for Elastic would object. I'd be a bit more annoyed
> about the fact they announced that their commercial XPack add-ons would be
> 'open code' and everyone thinks that means 'open source' - which it clearly
> isn't.
>
>>
>> So, this is a 2 part question:
>>
>> 1) What are the analytic capability of Elasticsearch that cannot be done
>> using Solr?  I want to see a complete list if possible.
>> 2) Should an Elasticsearch user be worried that Elasticsearch may close
>> it's open-source policy at anytime or that outsiders have no say about
>> it's
>> road map?
>>
>
> That's a slightly different question about road map - but you do have some
> say, Elastic's developers have always been very helpful and open to
> suggestions from outsiders (who are also users of course!).
>
> Cheers
>
> Charlie
>
>>
>> Thanks,
>>
>> Steve
>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>



-- 
Vincenzo D'Amore


Re: Solr or Elasticsearch

2018-03-22 Thread Charlie Hull

On 22/03/2018 13:13, Steven White wrote:

Hi everyone,

There are some good write ups on the internet comparing the two and the one
thing that keeps coming up about Elasticsearch being superior to Solr is
it's analytic capability.  However, I cannot find what those analytic
capabilities are and why they cannot be done using Solr.  Can someone help
me with this question?


Hi Steve,

As you've said there are lots of writeups, some more out-of-date than 
others. http://solr-vs-elasticsearch.com/ is quite good on features.


The analytics in ES are based on a number of custom aggregations (which 
I always think of as facet-counting-on-steroids, but I realise it's more 
complicated than that). Here's an early doc on them 
https://www.elastic.co/guide/en/elasticsearch/guide/current/_analytics.html 
So you need a good grasp of Elasticsearch's DSL to use these. The 
integration with Kibana is good if you want to display your results.


Solr's analytic capabilities use a Solr Search Component: 
https://lucene.apache.org/solr/guide/7_2/analytics.html . As with a lot 
of Solr features these can appear a lot more complex than 
Elasticsearch's offering. Yonik's blog is also worth reading as he often 
talks about new and upcoming Solr features like this. 
http://yonik.com/solr-facet-functions/


As we've always said, there are few cases where you can't build a 
solution using either engine and I believe that's also true for analytics.


Personally, I'm a Solr user and the thing that concerns me about
Elasticsearch is the fact that it is owned by a company that can  any day
decide to stop making Elasticsearch avaialble under Apache license and even
completely close free access to it.


Yes, but why would they? It would be suicide for a company that have 
such an established open source heritage - not least because a lot of 
Lucene developers who work for Elastic would object. I'd be a bit more 
annoyed about the fact they announced that their commercial XPack 
add-ons would be 'open code' and everyone thinks that means 'open 
source' - which it clearly isn't.


So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done
using Solr?  I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close
it's open-source policy at anytime or that outsiders have no say about it's
road map?


That's a slightly different question about road map - but you do have 
some say, Elastic's developers have always been very helpful and open to 
suggestions from outsiders (who are also users of course!).


Cheers

Charlie


Thanks,

Steve




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


RE: Boosting Fields Based On The Query Provided

2018-03-22 Thread Mukhopadhyay, Aratrika
Thanks for your reply Shawn. The query elevation worked for us. I have another 
question though. Right now I have ways to handle specific queries in the 
elevate.xml. The concern I am having is that I may have hundreds of queries 
that need to return different pages first. Is the only way to do this via the 
elevate.xml or is there a better approach for instance boosting fields ?  When 
I am boosting fields in this fashion it is not working for me : 



  edismax
   url^50 host^30 content^20 title^10


  elevator

  


Thanks for your help .

Aratrika Mukhopadhyay
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, March 20, 2018 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting Fields Based On The Query Provided

On 3/20/2018 2:25 PM, Mukhopadhyay, Aratrika wrote:
>  I have a solr query which I am having a hard time configuring as I would 
> want it configured. Suppose I have a situation where I have two fields 
> field1(host field) and field2 (url field). I want a specific host to be 
> bubbled to the top for all terms except for when I am searching for specific 
> people in which case I want the URL to their landing page returned first. I 
> have configured the dismax query parser in my solrconfig but it seems that 
> the boost being applied is arbitrary .



>
> 
>   edismax
>   *:*
>name="bq">host:(www.starwars.com)^10https://urldefense.proofpoint.com/v2/url?u=http-3A__www.starwars.com-29-255e10-253c_str=DwID-g=L93KkjKsAC98uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMyiD1VH-RfTq9OP14=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0=QkEnIzj19X_nqC298QkAUDbjv_zmP1Xr9Vn_z6BQXoM=
>  >>
>   Carrie Fisher
>   url:( 
> http\:\/\/www.imdb.com\/name\/nm402/  >)^8
> Mark Hamill
>   url:( 
> http\:\/\/www.imdb.com\/name\/nm434/  >)^8
>
>   

I think there's a fundamental misunderstanding of how "defaults" works.

I have no idea what happens with multiple "q" parameters, which you have 
configured in defaults.  I do know that if your request includes a "q"
parameter, then what you've put in defaults for "q" is going to be overridden 
and ignored.

This section of the documentation covers defaults, appends, and invariants:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_6-5F6_requesthandlers-2Dand-2Dsearchcomponents-2Din-2Dsolrconfig.html-23RequestHandlersandSearchComponentsinSolrConfig-2DSearchHandlers=DwID-g=L93KkjKsAC98uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMyiD1VH-RfTq9OP14=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0=LcNEhj3Y-S5KMW2HP0CG9t9UpRgEVsTcP7u8QgqW3tk=
 

I think the Query Elevation Component might be the kind of functionality you're 
after.  What you're trying to do with defaults is NOT going to work.

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_6-5F6_the-2Dquery-2Delevation-2Dcomponent.html=DwID-g=L93KkjKsAC98uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMyiD1VH-RfTq9OP14=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0=I6TENNcAZab0ZE_j0tZ8hm8_7nuNFqhBwoey4Mm1T0E=
 

Thanks,
Shawn



Solr or Elasticsearch

2018-03-22 Thread Steven White
Hi everyone,

There are some good write ups on the internet comparing the two and the one
thing that keeps coming up about Elasticsearch being superior to Solr is
it's analytic capability.  However, I cannot find what those analytic
capabilities are and why they cannot be done using Solr.  Can someone help
me with this question?

Personally, I'm a Solr user and the thing that concerns me about
Elasticsearch is the fact that it is owned by a company that can  any day
decide to stop making Elasticsearch avaialble under Apache license and even
completely close free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done
using Solr?  I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close
it's open-source policy at anytime or that outsiders have no say about it's
road map?

Thanks,

Steve


querying vs. highlighting: complete freedom?

2018-03-22 Thread Arturas Mazeika
Hi Solr-Users,

I've been playing with a german collection of documents, where I tried to
search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
this a "legal" use case? My key question is how can I tell solr which query
analyzer to use for highlighting? Strictly speaking, I should use
hl.q=Kündigung to conceptually look for relevant information, but in this
case, no highlighting is returned (as all umlauts are left out in the
index) .

Additional infos:

solr version: 7.2
urls to query:

http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1

http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=K%C3%BCndigung=3=xml=1


Managed-schema:

  

  
  
  
  
  

  


Other additional infos:
https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted

Cheers,
Arturas