Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-13 Thread Bernd Fehling
Hi Greg,

after trying several hours with all combinations of parameters and not
getting any useful search result with complex search terms and edismax
I finally copied o.a.s.s.ExtendedDismaxQParser.java from version 4.10.4
to 5.5.3 and did a little modification in o.a.s.u.SolrPluginUtils.java.

Now it is searching correct and getting logical and valid search results
with any kind of complex search.
Problem solved.

But still, the edismax, at least of 5.5.3, has some bugs.
If I get time I will look into this but right now my problem is solved
and the customers and users are happy.

I hope that this buggy edismax version is not used in solr 6.x otherwise you
have the same problems there.

Regards
Bernd


Am 12.09.2016 um 05:10 schrieb Greg Pendlebury:
> Hi Bernd,
> 
> "From my point of view the old parsing behavior was correct.
> If searching for a term without operator it is always OR, otherwise
> you can add "+" or "-" to modify that. Now with q.op AND it is
> modified to "+" as a MUST."
> 
> It is correct in both cases. q.op dictates (for that query) what default
> operator to use when none is provided, and it is used as a priority over
> the system whole 'defaultOperator'. In either case, if you ask it to use
> OR, it uses it; if you ask it to use AND, it uses it. The behaviour from
> 4.10 that was changed (arguably fixed, although I know that is a debatable
> point) was that you asked it to use AND, and it ignored you (irrespective
> of whether you used defaultOperator or q.op). The are a few subtle
> distinctions that are being missed (like the difference between the boolean
> operators and the OCCURS flags that your are talking about), but they are
> not going to change the outcome.
> 
> 8812 related to users who had been historically setting the q.op parameter
> to influence the downstream default selection of 'mm' (If you don't provide
> 'mm' it is set for you based on 'q.op') instead of directly setting the
> 'mm' value themselves. But again in this case, you're setting 'mm' anyway,
> so it shouldn't be relevant.
> 
> Ta,
> Greg
> 
> On 9 September 2016 at 16:44, Bernd Fehling 
> wrote:
> 
>> Hi Greg,
>>
>> thanks a lot, thats it.
>> After setting q.op to OR it works _nearly_ as before with 4.10.4.
>>
>> But how stupid this?
>> I have in my schema 
>> and also had q.op to AND to make sure my default _is_ AND,
>> meant as conjunction between terms.
>> But now I have q.op to OR and defaultOperator in schema to AND
>> to just get _nearly_ my old behavior back.
>>
>> schema has following comment:
>> "... The default is OR, which is generally assumed so it is
>> not a good idea to change it globally here.  The "q.op" request
>> parameter takes precedence over this. ..."
>>
>> What I don't understand is why they change some major internals
>> and don't give any notice about how to keep old parsing behavior.
>>
>> From my point of view the old parsing behavior was correct.
>> If searching for a term without operator it is always OR, otherwise
>> you can add "+" or "-" to modify that. Now with q.op AND it is
>> modified to "+" as a MUST.
>>
>> I still get some differences in search results between 4.10.4 and 5.5.3.
>> What other side effects has this change of q.op from AND to OR in
>> other parts of query handling, parsing and searching?
>>
>> Regards
>> Bernd
>>
>> Am 09.09.2016 um 05:43 schrieb Greg Pendlebury:
>>> I forgot to mention the tickets:
>>> SOLR-2649 and SOLR-8812
>>>
>>> On 9 September 2016 at 13:38, Greg Pendlebury >>
>>> wrote:
>>>
 Under 4.10 q.op was ignored by the edismax parser and always forced to
>> OR.
 5.5 is looking at the q.op=AND you requested.

 There are also some changes to the default values selected for mm, but I
 doubt those apply here since you are setting it explicitly.

 On 8 September 2016 at 00:35, Mikhail Khludnev  wrote:

> I suppose
>+((text:star text:trek)~2)
> and
>   +(+text:star +text:trek)
> are equal. mm=2 is equal to +foo +bar
>
> On Wed, Sep 7, 2016 at 10:52 AM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
>
>> Hi list,
>>
>> while going from SOLR 4.10.4 to 5.5.3 I noticed a change in query
> parsing.
>> 4.10.4
>> text:star text:trek
>>   text:star text:trek
>>   (+((text:star text:trek)~2))/no_coord
>>   +((text:star text:trek)~2)
>>
>> 5.5.3
>> text:star text:trek
>>   text:star text:trek
>>   (+(+text:star +text:trek))/no_coord
>>   +(+text:star +text:trek)
>>
>> There are very many new features and changes between this two
>> versions.
>> It looks like a change in query parsing.
>> Can someone point me to the solr or lucene jira about the changes?
>> Or even give a hint how to get my "old" query parsing back?
>>
>> Regards
>> Bernd
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>>
> 


Re: Issues with Solr 6.2 Backup/Restore feature

2016-09-13 Thread Georg Bottenhofer
Hi Hrishikesh,

unfortunately there’s no output with Level Warning or higher at the Live-System 
:(

I checked solr.log, solr-8983-console.log and solr_gc.log.

Mit freundlichem Gruß
Georg Bottenhofer

--
Kennst du schon WerStreamt.es?

Prüfe die Verfügbarkeit von Filmen und Serien
bei iTunes, Watchever, Maxdome u.v.m.

Im Web und als App: https://www.werstreamt.es/
--


x75 GmbH
Onlinelösungen | Werbekonzeption | Designleistungen

Orleansstraße 45a
81667 München

T. 089 6244 759-63
F. 089 6244 759-860

georg.bottenho...@x75.net


--
x75 GmbH
Sitz: Orleansstraße 45a, 81667 München Amtsgericht München HRB 178409
Geschäftsführer: Johannes Hammersen
Ust-Id: DE264251090

Am 14.09.2016 um 06:01 schrieb Hrishikesh Gadre 
mailto:gadre.s...@gmail.com>>:

Hi Georg,

It looks like "_2ag9.si" file is missing in the snapshot.shard1 
folder of
the backup. Can you try running Lucene CheckIndex tool on the
snapshot.shard1 folder ?

Also can you post the logs during the backup creation as well?

Thanks
Hrishikesh






On Tue, Sep 13, 2016 at 7:12 PM, Georg Bottenhofer <
georg.bottenho...@x75.net> wrote:

Hi.

We have two Systems running in different Datacenters. The second system is
a failover system with about one hour difference in Data what is OK for us.

Until Solr 5.5.3 we used a hack with the „replication/snapshot“-Tool to
copy the indizes over and it worked quite well. A few days ago we upgraded
our staging-system to Solr 6.2.0 and wanted to try the new collections
backup/restore tool.

First I tried to backup an index with about 50MByte and restored it
multiple times to another solr-system with success. Now I took a copy of
the Live-System (about 350MB) and failed.

I run the Backup with:

curl 'http://localhost:8983/solr/admin/collections?action=
BACKUP&name=mybackup&collection=collection1&location=/backup/'

and wanted to restore it the other system after copying the files via tgz
to the same directory structure with:

curl 'http://localhost:8983/solr/admin/collections?action=
RESTORE&name=mybackup&location=/tmp/backup/&collection=collection1'

I pasted a copy of the Error here:
http://pastebin.com/T1C2BxcC

I already tried to get help in the IRC-Channel, but elyograg mentioned I
have to post this at the mailing list, hoping you could give me a clue what
I am doing wrong.

All systems are running with:
- Solr 6.2.0
- Zookeeper 3.4.9
- java version „1.8.0_101"
- 16 GB RAM

Thanks for your help in advance.

Mit freundlichem Gruß
Georg Bottenhofer

--
Kennst du schon WerStreamt.es?

Prüfe die Verfügbarkeit von Filmen und Serien
bei iTunes, Watchever, Maxdome u.v.m.

Im Web und als App: https://www.werstreamt.es/
--


x75 GmbH
Onlinelösungen | Werbekonzeption | Designleistungen

Orleansstraße 45a
81667 München

T. 089 6244 759-63
F. 089 6244 759-860

georg.bottenho...@x75.net


--
x75 GmbH
Sitz: Orleansstraße 45a, 81667 München Amtsgericht München HRB 178409
Geschäftsführer: Johannes Hammersen
Ust-Id: DE264251090





Re: Issues with Solr 6.2 Backup/Restore feature

2016-09-13 Thread Hrishikesh Gadre
Hi Georg,

It looks like "_2ag9.si" file is missing in the snapshot.shard1 folder of
the backup. Can you try running Lucene CheckIndex tool on the
snapshot.shard1 folder ?

Also can you post the logs during the backup creation as well?

Thanks
Hrishikesh






On Tue, Sep 13, 2016 at 7:12 PM, Georg Bottenhofer <
georg.bottenho...@x75.net> wrote:

> Hi.
>
> We have two Systems running in different Datacenters. The second system is
> a failover system with about one hour difference in Data what is OK for us.
>
> Until Solr 5.5.3 we used a hack with the „replication/snapshot“-Tool to
> copy the indizes over and it worked quite well. A few days ago we upgraded
> our staging-system to Solr 6.2.0 and wanted to try the new collections
> backup/restore tool.
>
> First I tried to backup an index with about 50MByte and restored it
> multiple times to another solr-system with success. Now I took a copy of
> the Live-System (about 350MB) and failed.
>
> I run the Backup with:
>
> curl 'http://localhost:8983/solr/admin/collections?action=
> BACKUP&name=mybackup&collection=collection1&location=/backup/'
>
> and wanted to restore it the other system after copying the files via tgz
> to the same directory structure with:
>
> curl 'http://localhost:8983/solr/admin/collections?action=
> RESTORE&name=mybackup&location=/tmp/backup/&collection=collection1'
>
> I pasted a copy of the Error here:
> http://pastebin.com/T1C2BxcC
>
> I already tried to get help in the IRC-Channel, but elyograg mentioned I
> have to post this at the mailing list, hoping you could give me a clue what
> I am doing wrong.
>
> All systems are running with:
> - Solr 6.2.0
> - Zookeeper 3.4.9
> - java version „1.8.0_101"
> - 16 GB RAM
>
> Thanks for your help in advance.
>
> Mit freundlichem Gruß
> Georg Bottenhofer
>
> --
> Kennst du schon WerStreamt.es?
>
> Prüfe die Verfügbarkeit von Filmen und Serien
> bei iTunes, Watchever, Maxdome u.v.m.
>
> Im Web und als App: https://www.werstreamt.es/
> --
>
>
> x75 GmbH
> Onlinelösungen | Werbekonzeption | Designleistungen
>
> Orleansstraße 45a
> 81667 München
>
> T. 089 6244 759-63
> F. 089 6244 759-860
>
> georg.bottenho...@x75.net
>
>
> --
> x75 GmbH
> Sitz: Orleansstraße 45a, 81667 München Amtsgericht München HRB 178409
> Geschäftsführer: Johannes Hammersen
> Ust-Id: DE264251090
>
>


Issues with Solr 6.2 Backup/Restore feature

2016-09-13 Thread Georg Bottenhofer
Hi.

We have two Systems running in different Datacenters. The second system is a 
failover system with about one hour difference in Data what is OK for us.

Until Solr 5.5.3 we used a hack with the „replication/snapshot“-Tool to copy 
the indizes over and it worked quite well. A few days ago we upgraded our 
staging-system to Solr 6.2.0 and wanted to try the new collections 
backup/restore tool.

First I tried to backup an index with about 50MByte and restored it multiple 
times to another solr-system with success. Now I took a copy of the Live-System 
(about 350MB) and failed.

I run the Backup with:

curl 
'http://localhost:8983/solr/admin/collections?action=BACKUP&name=mybackup&collection=collection1&location=/backup/'

and wanted to restore it the other system after copying the files via tgz to 
the same directory structure with:

curl 
'http://localhost:8983/solr/admin/collections?action=RESTORE&name=mybackup&location=/tmp/backup/&collection=collection1'

I pasted a copy of the Error here:
http://pastebin.com/T1C2BxcC

I already tried to get help in the IRC-Channel, but elyograg mentioned I have 
to post this at the mailing list, hoping you could give me a clue what I am 
doing wrong.

All systems are running with:
- Solr 6.2.0
- Zookeeper 3.4.9
- java version „1.8.0_101"
- 16 GB RAM

Thanks for your help in advance.

Mit freundlichem Gruß
Georg Bottenhofer

--
Kennst du schon WerStreamt.es?

Prüfe die Verfügbarkeit von Filmen und Serien
bei iTunes, Watchever, Maxdome u.v.m.

Im Web und als App: https://www.werstreamt.es/
--


x75 GmbH
Onlinelösungen | Werbekonzeption | Designleistungen

Orleansstraße 45a
81667 München

T. 089 6244 759-63
F. 089 6244 759-860

georg.bottenho...@x75.net


--
x75 GmbH
Sitz: Orleansstraße 45a, 81667 München Amtsgericht München HRB 178409
Geschäftsführer: Johannes Hammersen
Ust-Id: DE264251090



Re: Solr on HDFS: adding a shard replica

2016-09-13 Thread Chetas Joshi
Is this happening because I have set replicationFactor=1?
So even if I manually add replica for the shard that's down, it will just
create a dataDir but would not copy any of the data into the dataDir?

On Tue, Sep 13, 2016 at 6:07 PM, Chetas Joshi 
wrote:

> Hi,
>
> I just started experimenting with solr cloud.
>
> I have a solr cloud of 20 nodes. I have one collection with 18 shards
> running on 18 different nodes with replication factor=1.
>
> When one of my shards goes down, I create a replica using the Solr UI. On
> HDFS I see a core getting added. But the data (index table and tlog)
> information does not get copied over to that directory. For example, on
> HDFS I have
>
> /solr/collection/core_node_1/data/index
> /solr/collection/core_node_1/data/tlog
>
> when I create a replica of a shard, it creates
>
> /solr/collection/core_node_19/data/index
> /solr/collection/core_node_19/data/tlog
>
> (core_node_19 as I already have 18 shards for the collection). The issue
> is both my folders  core_node_19/data/index and core_node_19/data/tlog are
> empty. Data does not get copied over from core_node_1/data/index and
> core_node_1/data/tlog.
>
> I need to remove core_node_1 and just keep core_node_19 (the replica). Why
> the data is not getting copied over? Do I need to manually move all the
> data from one folder to the other?
>
> Thank you,
> Chetas.
>
>


Re: Solr Cloud: Higher search latency with two nodes vs one node

2016-09-13 Thread Brent
Thanks for the reply.

The overhead you describe is what I suspected, I was just suprised that if
DSE is able to keep that overhead small enough that the overall result is
faster with the extra hardware, Solr doesn't also benefit.

I did try with RF=2 and shards=1, and yep, it's way fast. Really nice
performance. I'm hoping I can figure out a configuration that will allow me
to get this type of performance boost over DSE with a much higher load.

Follow up question:
I'm getting periodic errors when adding documents in the Java client app:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://:8983/solr/

I believe it's always preceded by this log message:
Request to collection  failed due to (0)
org.apache.http.NoHttpResponseException: :8983 failed to
respond, retry? 0

I'm guessing it's due to a timeout, but maybe not my client's timeout
setting, but instead a timeout between the two Solr Cloud servers, perhaps
when a query is being sent from one to the other.


Probably unrelated to those errors because these are at different times, but
in the Solr logs, I'm getting occasional warning+error messages like this:
always server 2:
WARN  [c: s:shard1 r:core_node2 x:_shard1_replica1] o.a.s.c.RecoveryStrategy Stopping recovery for
core=[_shard1_replica1] coreNodeName=[core_node2]
followed by server 1:
ERROR [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: :8983 failed to
respond

WARN  [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader is
publishing core=_shard1_replica1 coreNodeName =core_node2
state=down on behalf of un-reachable replica http://:8983/solr/_shard1_replica1/

Any idea what the cause is for these and how to avoid them?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894p4296124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr on HDFS: adding a shard replica

2016-09-13 Thread Chetas Joshi
Hi,

I just started experimenting with solr cloud.

I have a solr cloud of 20 nodes. I have one collection with 18 shards
running on 18 different nodes with replication factor=1.

When one of my shards goes down, I create a replica using the Solr UI. On
HDFS I see a core getting added. But the data (index table and tlog)
information does not get copied over to that directory. For example, on
HDFS I have

/solr/collection/core_node_1/data/index
/solr/collection/core_node_1/data/tlog

when I create a replica of a shard, it creates

/solr/collection/core_node_19/data/index
/solr/collection/core_node_19/data/tlog

(core_node_19 as I already have 18 shards for the collection). The issue is
both my folders  core_node_19/data/index and core_node_19/data/tlog are
empty. Data does not get copied over from core_node_1/data/index and
core_node_1/data/tlog.

I need to remove core_node_1 and just keep core_node_19 (the replica). Why
the data is not getting copied over? Do I need to manually move all the
data from one folder to the other?

Thank you,
Chetas.


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Alexandre Rafalovitch
On 14 September 2016 at 06:42, Aaron Greenspan
 wrote:
> This is a potential solution, but not one I choose to pursue. For one thing, 
> I am not an idiot. I’ve managed Linux systems for about 18 years now and I’ve 
> been programming for 20. I have learned that I am rarely the best at 
> anything, so sure, I fully admit that there will always be others with much 
> better skills than my own. But I’m an intelligent person with experience in 
> software trying to leave constructive feedback, and being told that my 
> feedback essentially reflects on my own stupidity kind of misses the point. 
> I’m providing feedback because things need fixing.


And that's the part I am confused about. Managing LInux systems is a
real pain in the ... Config files locations differ between
distributions. Upgrades are confusing, error messages are truly
critic. Debugging by dmesg and truss/strace is dark arts. Reading the
logs is nearly an art.

But with that experience, Zookeeper port is an lsof away. Or a ps away
if you want to read it from the parameters. Or a netstat away. Binding
anything to a local subnet is something of a standard firewall
operation. Couple of other things are output as part of "bin/solr
start --help" as well as part of log messages of running examples. I
understand the frustration. I truly do. And my own - committer - focus
is on improving beginner's experience (no, not looking for funding).
But the problems you list are definitely should be minor, not major
pain points to a Linux system administrator.

Solr is not like a WordPress. WordPress is designed for external users
and so is optimized for ease of used at expense of everything else.
Not correctness, not internal security (passwords are plain text,
plugins have access to everything), not ease of "good" development.
And WordPress is a _small_ product. It is a wrong comparison to the
point that apples and oranges are in the same category of fruit.

Solr is like a MySQL at least. And, frankly, changing a root password
in MySQL is also quite a pain. Or BEA weblogic (which I could...)

As to JIRA, the question was of a very high granularity on a use case
that is complicated and is not a bug/feature distinction. It also has
been discussed multiple times on the User list.



Anyway, I see one JIRA in this so far (Admin UI reclosing the log
message). If nobody else opens it, I will and have a go at it in the
next couple of days.

Regards,
   Alex
P.s. Aaron, perhaps you missed it with the digest mode, but I JUST
asked for feedback on an example reading group idea. I would have
expected you to jump on an opportunity as that would mean a direct
access to somebody contributing their contributor (mine!) time to
improve your understanding of Solr at whatever level of knowledge you
currently have. And - if that's not obvious - to see what kind of
things people find difficult to feed it into the next version of Solr.
Several people already showed interest, but you are not among them.
Hopefully, you will see that email and join us on your next read
through the digest. For easy reference, the survey link again is:
https://www.surveymonkey.com/r/JH8S666


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Aaron Greenspan
Hello again…

I get this on digest mode (and wasn’t even sure my initial message went through 
to the list), so please forgive the delay in responding.

I think the various reactions to my post suggest that a sizable number of users 
(and by "users" I mean those who are not affiliated with Apache and who are not 
core contributors) find Solr difficult to use. For me, this was confirmed many 
months ago when a family friend—a non-technical CEO twice my age of a company 
recently acquired for a very sizable sum—came over for dinner and without any 
prompting from anyone began complaining about this impossible program at work 
called Solr that none of his engineers could get to work. By his telling, he 
had several experienced engineers working on it.

I’m aware that issues with Java are not Solr’s fault. But most programs still 
manage to gracefully fail when they are missing a dependency, and then clearly 
report what’s missing. If you’re not actually a Java programmer, which I am 
not, "major.minor 52.0" (for example) is meaningless gibberish. "Please 
download and install JRE 1.8 to run this software" would be considerably 
clearer. How is it that Solr can search through millions of files, but it can’t 
do that?

As for the suggestions that I should (1) read the documentation; (2) file 
reports on JIRA; and (3) hire a consultant if my own skills aren’t up to snuff:

1. I did. The documentation is severely lacking, apparently having been written 
by project contributors who have vastly different goals than their users. 
Example #1: the security issue (I still can’t figure out how to 
password-protect the Solr web UI, a convenience perhaps, but a convenience I 
depend upon because I cannot spend all day handling the combination of the 
command line and Java, neither can most people, and I still can’t figure out 
how to install or use "Zookeeper") is documented here:

https://cwiki.apache.org/confluence/display/solr/Securing+Solr

Note the red section at the bottom (which originally wasn’t even there): "No 
Solr API, including the Admin UI, is designed to be exposed to non-trusted 
parties. Tune your firewall so that only trusted computers and people are 
allowed access." If one of my employees tried to pull this I would fire them. 
Admin UIs in every other product I’ve ever seen are password-protected. Always. 
Netscape Enterprise Server in 1996 had a password for its admin UI. (See 
https://docs.oracle.com/cd/E19957-01/816-5654-10/816-5654-10.pdf, page 62.) 
Cobalt RaQs in 1999 had passwords on their Admin UIs. Home routers have had 
passwords since time immemorial. It’s 2016. Solr is on major release version 6. 
Don’t tell me how to configure my firewall, and certainly don’t tell me that 
firewalls can programmatically block access to "people". (My understanding of 
firewalls is that they block access to IP addresses and/or ports—if there was a 
product that could magically always block certain people, who would bother with 
firewalls?) Even if I configure my firewall to restrict [IP]:8983 to one IP, 
many people may use that IP, especially at a large organization with enough 
data to merit a product like Solr! Fix your dangerously insecure product, give 
it an install script that handles the SSL cert generation, and if I for some 
reason need to do something to turn that fix on, please tell me how.

Note also, going back to Zookeeper, that on 
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin, 
the documentation states, "Run the following command to upload it to Zookeeper. 
(ensure that the Zookeeper port is correct)". First, what is it talking about? 
I’ve never heard of Zookeeper outside of an actual zoo. Second, it runs on a 
port? Third, which port? Is it 9983, as is only cryptically alluded to below? 
What if it’s not running yet? How do I start it? Is it secure? Is it part of 
Java? Is it part of Solr? Is it even installed? Why is this my problem? Can you 
imagine if any other piece of software involving a password worked this way? 
(Yes, I have read the Apache Zookeeper Wikipedia article. My point about flaws 
in the documentation stands. It is confusing to new users—those who need it 
most.)

Example #2: the potential bug involving the fieldType error message. I searched 
for documentation on the fieldType. Something about the Solr API 
(https://lucene.apache.org/solr/6_2_0/solr-core/org/apache/solr/schema/FieldType.html)
 came up which was not relevant or helpful. It’s easy to say RTFM, but what if 
the product is full of bugs? Those tend not to be in manuals.

After doing this dance enough times I’ve learned that the Solr documentation is 
most often out-of-date or unhelpful. So here I am.

2. I have filed several reports on JIRA. Here’s the kind of response I have 
received in the past:

https://issues.apache.org/jira/browse/SOLR-7896?focusedCommentId=14661324&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14661324

"Please bring this k

[Result Query Solr] How to retrieve the content of pdfs

2016-09-13 Thread Alexandre Martins
Hi Guys,

I'm trying to use the last version of solr and i have used the post tool to
upload 28 pdf files and it works fine. However, I don't know how to show
the content of the files in the resulted json. Anybody know how to include
this field?

"responseHeader":{ "zkConnected":true, "status":0, "QTime":43, "params":{ "q
":"ABC", "indent":"on", "wt":"json", "_":"1473804420750"}}, "response":{"
numFound":40,"start":0,"maxScore":9.1066065,"docs":[ { "id":
"/home/alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf", "
date":["2016-09-13T14:44:17Z"], "pdf_pdfversion":[1.5],
"xmp_creatortool":["PDFCreator
Version 1.7.3"], "stream_content_type":["application/pdf"], "
access_permission_modify_annotations":[false], "
access_permission_can_print_degraded":[false], "dc_creator":["abc"], "
dcterms_created":["2016-09-13T14:44:17Z"], "last_modified":[
"2016-09-13T14:44:17Z"], "dcterms_modified":["2016-09-13T14:44:17Z"], "
dc_format":["application/pdf; version=1.5"], "title":["ABC tittle"], "
xmpmm_documentid":["uuid:100ccff2-7c1c-11e6--ab7b62fc46ae"], "
last_save_date":["2016-09-13T14:44:17Z"], "access_permission_fill_in_form":[
false], "meta_save_date":["2016-09-13T14:44:17Z"], "pdf_encrypted":[false],
"dc_title":["Tittle abc"], "modified":["2016-09-13T14:44:17Z"], "
content_type":["application/pdf"], "stream_size":[101948], "x_parsed_by":[
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pdf.PDFParser"], "creator":["mauricio.tostes"], "
meta_author":["mauricio.tostes"], "meta_creation_date":[
"2016-09-13T14:44:17Z"], "created":["Tue Sep 13 14:44:17 UTC 2016"], "
access_permission_extract_for_accessibility":[false], "
access_permission_assemble_document":[false], "xmptpg_npages":[3], "
creation_date":["2016-09-13T14:44:17Z"], "resourcename":[
"/home/alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf"], "
access_permission_extract_content":[false], "access_permission_can_print":[
false], "author":["abc.add"], "producer":["GPL Ghostscript 9.10"], "
access_permission_can_modify":[false], "_version_":1545395897488113664},

Alexandre Costa Martins
DATAPREV - IT Analyst
Software Reuse Researcher
MSc Federal University of Pernambuco
RiSE Member - http://www.rise.com.br
Sun Certified Programmer for Java 5.0 (SCPJ5.0)

MSN: xandecmart...@hotmail.com
GTalk: alexandremart...@gmail.com
Skype: xandecmartins
Mobile: +55 (85) 9626-3631


help with field definition

2016-09-13 Thread Gandham, Satya
HI,

  I need help with defining a field ‘singerName’ with the right 
tokenizers and filters such that it gives me the below described behavior:

I have a few documents as given below:

Doc 1
  singerName: Justin Beiber
Doc 2:
  singerName: Justin Timberlake
…


Below is the list of quries and the corresponding matches:

Query 1: “My fav artist Justin Beiber is very impressive”
Docs Matched : Doc1

Query 2: “I have a Justin Timberlake poster on my wall”
Docs Matched: Doc2

Query 3: “The name Bieber Justin is unique”
Docs Matched: None

Query 4: “Timberlake is a lake of timber..?”
Docs Matched: None.

I have this described a bit more detailed here: 
http://stackoverflow.com/questions/39399321/solr-shingle-query-matching-keyword-tokenized-field

I’d appreciate any help in addressing this problem.

Thanks !!



Re: Bug with bootstrap_confdir?

2016-09-13 Thread Jan Høydahl
Hi,

So what you are attempting is to upgrade from a single-node SolrCloud (version?)
with built-in ZooKeeper to a Solr6.1 with external ZK. And you want to carry
over the configuration and data.

First, you must realise that all your collection config is in your old built-in
ZK, not in files in SOLR_HOME, so boostrap_confidir will not work here.
Anyway, it is not the recommended way to bootstrap Solr anymore, so if you
had a SOLR_HOME with a collection1/conf/ folder the preferred way to get
it into Zookeeper would be

1. Put your collection1 folder somewhere else
2. Install Solr with external ZK and start up without collections
3. bin/solr create -c mycollection -d /path/to/old/collection1/conf
4. Then get your data in, i.e. by stopping Solr, copying the data folder and 
starting

If you have a copy of your config outside of ZK you can use this approach
if your config is only in ZK, you will first have to download it from the old
built-in ZK to disk (zkcli downconfig)

If you need help in debugging any other concrete error messages, please
share exact command-line or action taken along with copy/paste of relevant 
error 
logs, so we can know exactly what you were trying to do.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 13. sep. 2016 kl. 19.02 skrev M Skazy :
> 
> Hi Jan,
> 
> I'll try to shed some more light on the issue I ran into.
> 
> ENVIRONMENT:
> 
> Solr Version 6.1.0 (however, same code exists in current master)
> SOLR_HOME=/some/directory (contains solr.xml)
> SOLR_HOME/collection1 is an existing core that I was using previously w/
> Solr configured using internal ZK
> 
> PROBLEM:
> 
> I then setup an external ZK instance which I wanted Solr to use.  Having
> solr use the external ZK puts it into solrcloud mode:
> https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L1105
> 
> This error happened right when I was booting up Solr w/ the configured
> external ZK for the first time.  The command was bin/solr start (run via
> the installed service so 'sudo service solr start').  The error is that the
> directory trying to be used does not exist, eg:
> 
> /some/directory/collection1  DOES exist
> 
> /opt/solr/server/solr/collection1 does NOT exist
> 
> MY INTERPRETATION:
> 
> There is a disconnect between the condition being asserted and the value
> being used to bootstrap.  My main opinion is that these two things should
> be consistent.  I don't see a use case as to why one would check a
> directory as existing at one location and then use a completely unrelated
> path in the following command.  Where I'm unclear is whether the
> conditional should be looking specifically in the install directory, or if
> bootstrap_confdir should be using SOLR_HOME.
> 
> I would argue SOLR_HOME is the correct solution since bootsrap_conf is
> intended to be "If you pass Dbootstrap_conf=true on startup, each SolrCore
> you have configured will have it's configuration files automatically
> uploaded and linked to the collection that SolrCore is part of".  If the
> user changes SOLR_HOME it is usually because the core config/data is stored
> outside the installation directory.
> 
> Mike
> 
> 
> On Sat, Sep 10, 2016 at 5:39 PM, Jan Høydahl  wrote:
> 
>> Hi
>> 
>> It is not clear to me what exact command you were doing when you got an
>> error.
>> Please state your solr version, the exact command you were trying to
>> execute
>> and what error msg you got.
>> 
>> Why do you believe that resolving relative bootstrap_confdir relative to
>> SOLR_HOME
>> is a good idea? If a config resides relative to SOLR_HOME, Solr will try to
>> bootstrap that as a core at startup, which is probably not what you would
>> like.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 9. sep. 2016 kl. 19.04 skrev M Skazy :
>>> 
>>> Hi,
>>> 
>>> I was having an issue setting up a Solr instance w/ a external Zookeeper.
>>> My SOLR_HOME is not set to the default location.  I believe the problem
>> is
>>> related to the following line and I wanted to confirm if this is a bug:
>>> 
>>> https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L1383
>>> 
>>> It would seem that if we're checking if a file exists at some position
>>> relative to SOLR_HOME that the path supplied for bootstrap_confdir should
>>> also be rooted by SOLR_HOME instead of the working directory of the solr
>>> process.
>>> 
>>> For me this translated into errors when solr started as it was trying to
>>> load a configuration into ZK from a directory that did not exist.
>>> 
>>> The fix is easy and I can create a patch for this if it is decided that
>>> this is a bug.
>>> 
>>> Thanks,
>>> 
>>> Mike
>> 
>> 



facet.piovt on a join query, joining two different collections

2016-09-13 Thread Aswath Srinivasan (TMS)
Hello,

We are trying to do a pivot on the facet on two fields which exists on two 
different collections. We are joining the two collections using a common filed. 
Below is the query I'm having right now and it doesn't seem to work. Any help 
would be much appreciated.


http://loalhost:8983/solr/abc/select?q={!join from=id to=id 
fromIndex=xyz}*&wt=json&indent=true&start=0&rows=1&facet=true&facet.pivot= 
name,count&shards=localhost:8983/solr/abc,localhost:8983/solr/xyz



1.   id field exists in both collections - abc & xyz

2.   name field exist in xyz collection

3.   count field exist in abc collection


Thank you,
Aswath NS



Solr Reference Guide for Solr 6.2 released

2016-09-13 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for Solr 6.2 has been released.

This 717-page PDF is the definitive guide to using Apache Solr, the
blazing fast search server built on Apache Lucene. It can be
downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.2.pdf

- Cassandra


Re: Detecting down node with SolrJ

2016-09-13 Thread Shawn Heisey
On 9/9/2016 10:11 PM, Shawn Heisey wrote:
> On 9/9/2016 4:38 PM, Brent wrote:
>> Is there a way to tell whether or not a node at a specific address is
>> up using a SolrJ API? 
> Based on your other questions, I think you're running cloud. If that
> assumption is correct, use the Collections API with HttpSolrClient
> (instead of CloudSolrClient) to get a list of collections. Using
> HttpSolrClient will direct the request to that specific server. If it
> doesn't throw an exception, it's up. Here's some SolrJ code.

After reading a reply on a separate thread, I was reminded of the ping
handler.  That might seem like a good solution, but using the ping
handler would require that your code knows about a specific core on the
server, or that it knows the name of one of the SolrCloud collections. 
The collections admin request that I outlined previously will work
without the code needing any knowledge of what the cloud contains, .

SolrCloud can almost guarantee that any collection in the cloud will be
accessible from any node in the cloud if that node is functional ... but
if you want to extremely pedantic in your check, and you have a ping
handler defined in the config for the collection, you could use that
handler to make sure that the collection is functional as well as the
node.  To check that you can access a collection named "foo" from a
specific node, start with the code that I sent previously, and replace
the last two lines of code with these:

  SolrPing ping = new SolrPing().setActionPing();
  ping.process(client, "foo");

The SolrPing object assumes the ping handler is named "/admin/ping".  If
you have a ping handler with a different name/path, you should be able
to use the "setPath" method on the ping object to change it before doing
the "process" method.

Thanks,
Shawn



Re: Bug with bootstrap_confdir?

2016-09-13 Thread M Skazy
Hi Jan,

I'll try to shed some more light on the issue I ran into.

ENVIRONMENT:

Solr Version 6.1.0 (however, same code exists in current master)
SOLR_HOME=/some/directory (contains solr.xml)
SOLR_HOME/collection1 is an existing core that I was using previously w/
Solr configured using internal ZK

PROBLEM:

I then setup an external ZK instance which I wanted Solr to use.  Having
solr use the external ZK puts it into solrcloud mode:
https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L1105

This error happened right when I was booting up Solr w/ the configured
external ZK for the first time.  The command was bin/solr start (run via
the installed service so 'sudo service solr start').  The error is that the
directory trying to be used does not exist, eg:

/some/directory/collection1  DOES exist

/opt/solr/server/solr/collection1 does NOT exist

MY INTERPRETATION:

There is a disconnect between the condition being asserted and the value
being used to bootstrap.  My main opinion is that these two things should
be consistent.  I don't see a use case as to why one would check a
directory as existing at one location and then use a completely unrelated
path in the following command.  Where I'm unclear is whether the
conditional should be looking specifically in the install directory, or if
bootstrap_confdir should be using SOLR_HOME.

I would argue SOLR_HOME is the correct solution since bootsrap_conf is
intended to be "If you pass Dbootstrap_conf=true on startup, each SolrCore
you have configured will have it's configuration files automatically
uploaded and linked to the collection that SolrCore is part of".  If the
user changes SOLR_HOME it is usually because the core config/data is stored
outside the installation directory.

Mike


On Sat, Sep 10, 2016 at 5:39 PM, Jan Høydahl  wrote:

> Hi
>
> It is not clear to me what exact command you were doing when you got an
> error.
> Please state your solr version, the exact command you were trying to
> execute
> and what error msg you got.
>
> Why do you believe that resolving relative bootstrap_confdir relative to
> SOLR_HOME
> is a good idea? If a config resides relative to SOLR_HOME, Solr will try to
> bootstrap that as a core at startup, which is probably not what you would
> like.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 9. sep. 2016 kl. 19.04 skrev M Skazy :
> >
> > Hi,
> >
> > I was having an issue setting up a Solr instance w/ a external Zookeeper.
> > My SOLR_HOME is not set to the default location.  I believe the problem
> is
> > related to the following line and I wanted to confirm if this is a bug:
> >
> > https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L1383
> >
> > It would seem that if we're checking if a file exists at some position
> > relative to SOLR_HOME that the path supplied for bootstrap_confdir should
> > also be rooted by SOLR_HOME instead of the working directory of the solr
> > process.
> >
> > For me this translated into errors when solr started as it was trying to
> > load a configuration into ZK from a directory that did not exist.
> >
> > The fix is easy and I can create a patch for this if it is decided that
> > this is a bug.
> >
> > Thanks,
> >
> > Mike
>
>


(Survey/Experiment) Are you interested in a Solr example reading group?

2016-09-13 Thread Alexandre Rafalovitch
Is anybody interested in joining an example reading group for Solr
(6.2 or latest).

Basic idea: we take one of the examples that ship with Solr and ask
each other any and all questions related to it. Basic/beginner level
questions are allowed and welcomed. We could also share
tools/tips/ideas to make the examples easier to understand, etc.

Examples of potentially interesting questions:
*) Is this text_rev actually doing anything?
*) Why does this search against the example not do anything?
*) How do I remove all comments from this example configuration?
*) Can I delete this field/type/config section and have the example still work?
*) Where is the documentation that makes "this" tick?
*) What would this example data look like if it were in XML/CSV/JSONL?
*) Is this a bug, a feature, or just me?

This would be a separate time-bound group/list/slack (I am
open-to-suggestions), so only people interested and ready for
simple/narrow-focus questions be there.

If you are interested (or even if not), I just setup a very basic
survey to give your opinion at: https://www.surveymonkey.com/r/JH8S666

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


Re: [Solr facet distinct count] Benchmark and implementation details

2016-09-13 Thread Alessandro Benedetti
After a bit of investigation, I am verifying I get over the double of qTime
for a single solr query on a distributed evnironment.
I will go into the details, but before I go into the code, is the unique
functionality going to be helped if we store docValues for the unique field
?

I have a cardinality of 50.000.000 docs, the field I am facet has a
cardinality of 50 values, each bucket is around 1.000.000 docs and the
unique field cardinality is 20.000 .

I was not thinking these to be big numbers, will need to speed up my query
as I am assuming something is going as expected :)

Cheers

On Mon, Sep 12, 2016 at 11:59 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Hi gents,
> was taking a look to the ways to calculate distinct count per facet.
>
> Reading through Yonik blogs [1] it seems quite safe to assume the "
> unique(field)" is the approach to go.
>
> Do we have any benchmark or details about the implementation ?
> Because as per Yonik blog it is faster than HyperLogLog so I assume it is
> using different data structures and algorithms.
> Worst case scenario I go through the code, but any presentation or blog
> would be useful!
> Cheers
>
>
> [1] http://yonik.com/solr-count-distinct/ , http://yonik.com/facet-
> performance/
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Unable to connect to correct port in solr 6.2.0

2016-09-13 Thread Aneesh Mon N
Hi Preet,

I prepared the doc for 5.2.1; Should be more or less similar.
Try this once.
http://amn-solr.blogspot.in/2015/08/solr-521-cloud-configuration-steps.html

Regards,
Aneesh N

On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat 
wrote:

> HI All,
>
> I am trying to setup the solr in Redhat Linux, using the
> install_solr_service.sh script of solr.6.2.0  tgz. The script runs and
> starts the solr on port 8983 even when the port is specifically specified
> as 2016.
>
> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u root
> -s solr -p 2016
>
> Is this correct way to setup solr in linux? Also, I have observed that if
> I go to the /bin/solr and start with the port number its working as
> expected but not as service.
>
> I would like to setup the SOLR in SOLRCloud mode with external zookeepers.
>
> Could someone please advise on this?
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication. Any
> unauthorized copying, disclosure or distribution of the material in this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily represent
> those of the company. Finally, the recipient should check this email and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
>
>
>


-- 
Regards,
Aneesh Mon N
Chennai
+91-8197-188-588


Re: Solr Cloud: Higher search latency with two nodes vs one node

2016-09-13 Thread Toke Eskildsen
Brent  wrote:
> I've been testing Solr Cloud 6.1.0 with two servers, and getting somewhat
> disappointing query latency. I'm comparing the latency with the same tests,
> running DSE in place of Solr Cloud. It's surprising, because running the
> test just on my laptop (running a single instance of Solr), I get
> significantly better latency with Solr than with DSE.

I can understand why it is surprising, but fortunately there is a simple 
explanation:

> In theory, shouldn't 2 nodes running Solr be the fastest?

Depends on setup, corpus & queries. My own rule of thumb: Only shard if you are 
really sure it will help.

> When running Solr with just one node, I create the collection with 1 shard.
> When running Solr with both nodes, I create the collection with 2 shards.

The number of shards is the reason. With 1 shard, 1 request will be processed 
by 1 core.

With 2 shards, 1 request will be processed by 2 cores: If the outside request 
is directed at core A, core A will send 1 request to core B, find the top 
merged hits from both cores, then send another request to core B to resolve the 
documents for the hits from core B (there might be some optimization that does 
away with the second call in some cases, but it only mitigates the problem).

As you see there is an overhead for using 2+ shards. For smaller setups, that 
overhead can easily overshadow the gains from the extra hardware power 
introduced by sharding.

If you want to improve your query performance then use 1 shard with replication.

- Toke Eskildsen


Re: Unable to connect to correct port in solr 6.2.0

2016-09-13 Thread Shalin Shekhar Mangar
Good to know. Thank you!

On Tue, Sep 13, 2016 at 4:19 PM, Preeti Bhat 
wrote:

> Thanks Shekhar, Re install was successful. I had run on default port prior
> to running on 2016. T
>
>
> Thanks and Regards,
> Preeti Bhat
>
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
> Sent: Tuesday, September 13, 2016 1:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Unable to connect to correct port in solr 6.2.0
>
> I just tried this out on ubuntu (sorry I don't have access to a red hat
> system) and it works fine.
>
> One thing that you have to take care of is that if you install the service
> on the default 8983 port then, trying to upgrade with the same tar to a
> different port does not work. So please ensure that you hadn't already
> installed the service before already.
>
> On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Which version of red hat? Is lsof installed on this system?
> >
> > On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat
> > 
> > wrote:
> >
> >> HI All,
> >>
> >> I am trying to setup the solr in Redhat Linux, using the
> >> install_solr_service.sh script of solr.6.2.0  tgz. The script runs
> >> and starts the solr on port 8983 even when the port is specifically
> >> specified as 2016.
> >>
> >> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u
> >> root -s solr -p 2016
> >>
> >> Is this correct way to setup solr in linux? Also, I have observed
> >> that if I go to the /bin/solr and start with the port number its
> >> working as expected but not as service.
> >>
> >> I would like to setup the SOLR in SOLRCloud mode with external
> zookeepers.
> >>
> >> Could someone please advise on this?
> >>
> >>
> >>
> >> NOTICE TO RECIPIENTS: This communication may contain confidential
> >> and/or privileged information. If you are not the intended recipient
> >> (or have received this communication in error) please notify the
> >> sender and it-supp...@shoregrp.com immediately, and destroy this
> >> communication. Any unauthorized copying, disclosure or distribution
> >> of the material in this communication is strictly forbidden. Any
> >> views or opinions presented in this email are solely those of the
> >> author and do not necessarily represent those of the company.
> >> Finally, the recipient should check this email and any attachments
> >> for the presence of viruses. The company accepts no liability for any
> damage caused by any virus transmitted by this email.
> >>
> >>
> >>
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication. Any
> unauthorized copying, disclosure or distribution of the material in this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily represent
> those of the company. Finally, the recipient should check this email and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Shawn Heisey
On 9/12/2016 3:48 PM, Aaron Greenspan wrote:
> I have been on this list for some time because I know that any time I
> try to do anything related to Solr I’m going to have to spend hours on
> it, wondering why everything has to be so awful, and I just want
> somewhere to provide feedback with the dim hope that the product might
> improve one day. (So far, for my purposes, it hasn’t.) Sure enough, I
> still absolutely hate using Solr, and I have more feedback. 

First, let me thank you for mentioning your experiences.  It's harsh
feedback, but I still welcome it.  I'm going to say some things that may
sound to you like excuses ... and you aren't really wrong to think that,
but we *do* take your comments seriously, and in many cases, we already
know that there are problems we need to solve.

As others have said, and as I'm sure you probably know, open source is
created by people trying to solve a problem for themselves, and is done
on  a volunteer basis.  If a project is lucky, it attracts interested
volunteers and truly magical things happen for the project users.  I
think Solr is a good project, with an awesome community.

Beginner documentation is one of the hardest things to write.  It's
difficult for people who live and breathe the software to view the
system from the perspective of someone who has never touched it at all,
and to write something that explains to that novice exactly how to make
it work.

> I started with a confusing error on the web console, which I still
> can’t figure out how to password protect without going through an
> insanely process involving "ZooKeeper," which I don’t know anything
> about, or have, to the best of my knowledge: Problem accessing /solr/.
> Reason: Forbidden 

This particular part of your message involves an old fight in the Solr
project:  Security.  Those of us who have been in the industry forever
have learned that many of the security features that people expect for
Internet-facing services are not at all helpful for the security of
internal systems like Solr.  The best thing you can do to prevent
problems is to place Solr in a location where it cannot be reached by
people who cannot be trusted.  If you can trust those who have access,
then there's no need for intrinsic security features.

Any security that you layer on top of Solr (encryption, authentication,
etc) is useless if somebody compromises the system that talks to Solr,
which already has all the keys/passwords/etc required to get right in.

As evidenced by the fact that authentication has come to Solr, we *are*
listening to our users that demand security features.  The
authentication feature that you are trying to use, which involves basic
username/password authentication of the API calls that the admin UI
makes (*not* the admin UI itself), was originally developed for
SolrCloud -- which utilizes Zookeeper as a central configuration
database.  Work is underway right now to bring basic authentication to
standalone Solr, but it is not going to be available until at least 6.3,
and may take even longer to finish.  It will also require separate
configuration on every host, which is not required for SolrCloud.

> According to logs, this apparently meant that a MySQL query had failed
> due to a field name change. Since I would have to change my XML
> configuration files, I decided to use the opportunity to upgrade from
> Solr 5.1.4 to 6.2.0. It broke everything. 

For almost ANY software, but especially for an open source package,
upgrading to a new major version is a good way to cause problems, not
usually a good way to solve them.  Solr does maintain compatibility with
configs that are completely current for the later versions in the
previous major release ... but there are a LOT of configs out in the
world (even in the latest versions of their software!) that were
originally designed for Solr 3.x, 4.x, or *early* 5.x.  Solr makes zero
guarantees about configs designed for software that old.

For an example of a similar situation in a different software package: 
If you try to copy configs for the Apache webserver (httpd) from a 2.2
install to a 2.4 install, chances are excellent that you're going to
have to change those configs before Apache will even start, much less
operate as expected.  Upgrading Apache from 2.2 to 2.4 is technically a
"minor" version upgrade, but in terms of capability and configuration,
is similar to a major Solr version upgrade.

> First I was getting errors about "Unsupported major.minor version
> 52.0", so I needed to install the Linux x64 JRE 1.8.0, which I managed
> on CentOS 6 with... yum install openjdk-1.8.0 ...going to Oracle’s web
> site, downloading the latest JRE 1.8 build, and then running... yum
> localinstall jre-8u101-linux-x64.rpm So far so good. But I didn’t have
> JAVA_HOME set properly apparently, so I needed to do the
> not-exactly-intuitive… export
> JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/

Others have already covered this top

json request api and facets

2016-09-13 Thread Michael Aleythe, Sternwald
Hi everybody,

i'm currently working on using the json request api for solr and hit a problem 
using facets. I'm using solr 5.5.2 and solrJ 5.5.2

When querying solr by url-parameters like so:
http://.../select?wt=json&facet.range=MEDIA_TS&f.MEDIA_TS.facet.range.end=2028-02-01T0:00:00.000Z&f.MEDIA_TS.facet.range.gap=%2B1YEAR&f.MEDIA_TS.facet.range.start=1993-01-01T0:00:00.000Z&facet=true

the returned json contains an element called "facet_counts" which is the top 
element for all faceting information.


:   "facet_counts":
:   {
:   :   "facet_queries":
:   :   {
:   :   },
:   :   "facet_fields":
:   :   {
:   :   },
:   :   "facet_dates":
:   :   {
:   :   },
:   :   "facet_ranges":
:   :   {
:   :   :   "MEDIA_TS":
:   :   :   {
:   :   :   :   "counts":
:   :   :   :   [
:   :   :   :   :   "1993-01-01T00:00:00Z",
:   :   :   :   :   0,
:   :   :   :   :   "1994-01-01T00:00:00Z",
:   :   :   :   :   1634,
:   :   :   :   :   "1995-01-01T00:00:00Z",
:   :   :   :   :   6656,
:   :   :   :   :   "1996-01-01T00:00:00Z",
:   :   :   :   :   30016,
:   :   :   :   :   "1997-01-01T00:00:00Z",
:   :   :   :   :   76819,
:   :   :   :   :   "1998-01-01T00:00:00Z",
:   :   :   :   :   152099,

The same query using the json request api like so:

{"facet":{"MEDIA_TS":{"field":"MEDIA_TS","gap":"+1YEAR","start":"1993-01-01T00:00:00Z","end":"2028-01-01T00:00:00Z","type":"range"}}}

Returns an element "facets" which is the top element for all faceting 
information. The whole structure of the response is different:

:   "facets":
:   {
:   :   "count":5815481,
:   :   "MEDIA_TS":
:   :   {
:   :   :   "buckets":
:   :   :   [
:   :   :   :   {
:   :   :   :   :   "val":"1993-01-01T00:00:00Z",
:   :   :   :   :   "count":0
:   :   :   :   },
:   :   :   :   {
:   :   :   :   :   "val":"1994-01-01T00:00:00Z",
:   :   :   :   :   "count":1634
:   :   :   :   },
:   :   :   :   {
:   :   :   :   :   "val":"1995-01-01T00:00:00Z",
:   :   :   :   :   "count":6656
:   :   :   :   },

This inconsistency breaks the respone parser of solrJ. Am i doing something 
wrong?


Best Regards
Michael


Michael Aleythe
Java Entwickler | STERNWALD SYSTEMS GMBH




Re: Re: Tagging and excluding Filters with BlockJoin Queries and BlockJoin Faceting

2016-09-13 Thread Mikhail Khludnev
I made one more attempt. It seems it works.
https://issues.apache.org/jira/browse/SOLR-8998?focusedCommentId=15487095&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15487095

On Wed, Aug 24, 2016 at 11:52 AM, Tobias Lorenz  wrote:

> I tried that too, with no effect.
>
> The excluded facet just disappears completely (even the value that is
> filtered on in the fq) when using the exclusion that has been tagged, like
> it did before.
> When using a random exclusion (e.g. foo) that facet is visible again in
> the result set, but that's obviously not helpful, I just tried to see what
> it would do.
>
> So this is my current research result:
>
> When excluding a facet which has been tagged in a filter query, this facet
> corresponding to the fq's tag disappears in the result set in solr 6.1 when
> using BlockJoin Queries and json facets (which it shouldn't).
>
> Let me know if you want me to do more research or have one more idea.
>
>
> -Ursprüngliche Nachricht-
> Von: Mikhail Khludnev [mailto:m...@apache.org]
> Gesendet: Mittwoch, 24. August 2016 09:06
> An: solr-user 
> Betreff: Re: Re: Tagging and excluding Filters with BlockJoin Queries and
> BlockJoin Faceting
>
> Sure. There are might mismatch with expectation. However, the first guess
> is to put {!tag into beginning. eg, check with fq={!tag=myTag}{!parent
> which='isparent:true'}color:blue
>
> On Tue, Aug 23, 2016 at 4:05 PM, Tobias Lorenz 
> wrote:
>
> > Hi Mikhail,
> >
> > Thanks for replying so quickly with a suggestion.
> >
> > I'm a colleague of Stefan and working with him on our project.
> >
> > We tried composing our solr query with exclusion instructions, and the
> > result was that the facet excluded by tag did not show up anymore in
> > the result, instead of showing all values.
> >
> > Your example from the last comment, completed by our exlusion
> instruction:
> >
> > json.facet={
> >   filter_by_children: {
> > type: query,
> > q: "isparent:false",
> > domain: {
> >   blockChildren: "isparent:true"
> > },
> > facet: {
> >   colors: {
> > type: terms,
> > field: color,
> > domain:{
> >   excludeTags:myTag
> > },
> > facet: {
> >   productsCount: "unique(_root_)"
> > }
> >   }
> > }
> >   }
> > }
> >
> >
> > and the corresponding filter query:
> >
> > fq={!parent which='isparent:true'}{!tag=myTag}color:blue
> >
> >
> > Either this feature is not working yet, or we are making a mistake
> > using it.
> > Of course we know it's still in development right now.
> >
> > Might you please have a look if we are doing something obviously wrong?
> >
> > Thanks,
> > Tobias
> >
> >
> >
> > >The last comment at https://issues.apache.org/jira/browse/SOLR-8998
> > >shows the current verbose json.facet syntax which provides aggregated
> > >facet counts already. It's a little bit slower that child.facet.field.
> > >Nevertheless, you can take this sample and add exclusion instructions
> > into.
> > >It should work. Let me know how does it, please.
> > >
> > >On Wed, Aug 17, 2016 at 5:35 PM, Stefan Moises 
> > wrote:
> > >
> > >> Hi Mikhail,
> > >>
> > >> thanks for the info ... what is the advantage of using the JSON
> > >> FACET
> > API
> > >> compared to the standard BlockJoinQuery features?
> > >>
> > >> Is there already anybody working on the tagging/exclusion feature
> > >> or is there any timeframe for it? There wasn't any discussion yet
> > >> in SOLR-8998 about exclusions, was there?
> > >>
> > >> Thank you very much,
> > >>
> > >> best,
> > >>
> > >> Stefan
> > >>
> > >>
> > >> Am 17.08.16 um 15:26 schrieb Mikhail Khludnev:
> > >>
> > >> Stefan,
> > >>> child.facet.field never intend to support exclusions. My
> > >>> preference is
> > to
> > >>> implement it under json.facet that's discussed under
> > >>> https://issues.apache.org/jira/browse/SOLR-8998.
> > >>>
> > >>> On Wed, Aug 17, 2016 at 3:52 PM, Stefan Moises
> > >>> 
> > >>> wrote:
> > >>>
> > >>> Hey girls and guys,
> > 
> >  for a long time we have been using our own BlockJoin
> >  Implementation, because for our Shop Systems a lot of
> >  requirements that we had were
> > not
> >  implemented in solr.
> > 
> >  As we now had a deeper look into how far the standard has come,
> >  we saw that BlockJoin and faceting on children is now part of the
> >  standard, which is pretty cool.
> >  When I tried to refactor our external code to use that now, I
> >  stumbled upon one non-working feature with BlockJoins that still
> >  keeps us from using
> >  it:
> > 
> >  It seems that tagging and excluding Filters with BlockJoin
> >  Faceting simply does not work yet.
> > 
> >  Simple query:
> > 
> >  &qt=products
> >  &q={!parent which='isparent:true'}shirt AND isparent:false
> >  &facet=true &fq={!parent
> >  which='isparent:true'}{!tag=myTag}color:grey
> >  &child.facet.fi

RE: Unable to connect to correct port in solr 6.2.0

2016-09-13 Thread Preeti Bhat
Thanks Shekhar, Re install was successful. I had run on default port prior to 
running on 2016. T


Thanks and Regards,
Preeti Bhat

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Tuesday, September 13, 2016 1:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Unable to connect to correct port in solr 6.2.0

I just tried this out on ubuntu (sorry I don't have access to a red hat
system) and it works fine.

One thing that you have to take care of is that if you install the service on 
the default 8983 port then, trying to upgrade with the same tar to a different 
port does not work. So please ensure that you hadn't already installed the 
service before already.

On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar < 
shalinman...@gmail.com> wrote:

> Which version of red hat? Is lsof installed on this system?
>
> On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat
> 
> wrote:
>
>> HI All,
>>
>> I am trying to setup the solr in Redhat Linux, using the
>> install_solr_service.sh script of solr.6.2.0  tgz. The script runs
>> and starts the solr on port 8983 even when the port is specifically
>> specified as 2016.
>>
>> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u
>> root -s solr -p 2016
>>
>> Is this correct way to setup solr in linux? Also, I have observed
>> that if I go to the /bin/solr and start with the port number its
>> working as expected but not as service.
>>
>> I would like to setup the SOLR in SOLRCloud mode with external zookeepers.
>>
>> Could someone please advise on this?
>>
>>
>>
>> NOTICE TO RECIPIENTS: This communication may contain confidential
>> and/or privileged information. If you are not the intended recipient
>> (or have received this communication in error) please notify the
>> sender and it-supp...@shoregrp.com immediately, and destroy this
>> communication. Any unauthorized copying, disclosure or distribution
>> of the material in this communication is strictly forbidden. Any
>> views or opinions presented in this email are solely those of the
>> author and do not necessarily represent those of the company.
>> Finally, the recipient should check this email and any attachments
>> for the presence of viruses. The company accepts no liability for any damage 
>> caused by any virus transmitted by this email.
>>
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--
Regards,
Shalin Shekhar Mangar.

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Alexandre Rafalovitch
On 13 September 2016 at 16:46, Alessandro Benedetti
 wrote:
>>  It didn’t say which field type. Buried in the logs I found a reference in
>> the Java stack trace—which *disappears* (and distorts the viewing window
>> horribly) after a few seconds when you try to view it in the web log UI—to
>> the string "units="degrees"".
>>
>
> This si a bug, and it is really annoying, not sure anyone already raised
> it, if not I suggest you to do that :)

I don't remember seeing it opened. So this could be a great practice
for somebody who is good at finding bugs and issues. :-)

I love a good rant. I used to produce those myself. I love even more a
good rant that includes  specific granular improvements others can get
behind. The bug report as suggested above would be a great one example
of such a granular thing.

I would also point out that most of the contributors to Lucene/Solr
open source are able to contribute because somebody _pays_ them to
develop something on top of/with those projects and they hit
limitations they cannot solve in other easier ways. Those _usually_
are the cutting edge features such as CDCR, new performance
improvements, etc. We could always do with _more_ people who will
focus on the more user-oriented features or on making those new
cutting edge features more easily accessible.

Regards,
   Alex.
P.s. And if anybody wants to rant and will be at Lucene/Solr
revolution, I will be more than happy to sit and listen to you during
any of the food breaks. And I'll help figuring out what those granular
improvement suggestions could be. Feel free to reach out directly if
you want to have a rant scheduled too, instead of catching me
organically :-)


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Yago Riveiro
I stuck in 5.3.1 because if upgrade to 5.5 or 6.x my cluster dies.  
  
Doing a rolling upgrade, when I upgrade the second node to 5.5 both die in the
per-sync phase, I don't know what changes in 5.5 but it's demanding a huge
quantity of memory to check if the replica it's in sync.  
  
This kind of stuff and the full re-index (12T)  between major releases are
indeed a pain.  
  
Cryptical errors and a deficient system to get metrics from what it's going on
inside the cluster is another issue, I'm unable to get the throughput in a
collection as a whole, the number of http connection in each node, the
utilization of the jetty thread pool and stuff like that.  
  
Solr is a great tool, but it's hard, too hard to get in.  
\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-
89046b47-a272?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

  
On Sep 13 2016, at 10:46 am, Alessandro Benedetti 
wrote:  

> First of all I second Bram, I am sorry you had a bad experience with Solr,  
but I think that:  
\- without a minimum study and documentation  
\- without trying to follow the best practices  
I think you are going to have a "miserable" experience with any software,  
don't you ?

>

> In addition to Bram :

>

> On Mon, Sep 12, 2016 at 10:48 PM, Aaron Greenspan <  
aaron.greens...@plainsite.org> wrote:  
>  
> It didn’t say which field type. Buried in the logs I found a reference in  
> the Java stack trace—which *disappears* (and distorts the viewing window  
> horribly) after a few seconds when you try to view it in the web log UI—to  
> the string "units="degrees"".  
>

>

> This si a bug, and it is really annoying, not sure anyone already raised  
it, if not I suggest you to do that :)  
But you can use the logs themselves without any problem.

>

> >  
> Apparently there is some aspect of the Thai text field type that Solr  
> 6.2.0 doesn’t like. So I disabled it. I don’t use Thai text.  
>

>

> If you were not using the Thai text, why had you the Thai Text field type  
defined ?  
Keep It Simple Stupid is the way :)  
I find tons of Solr instances in production mith monster solrconfig.xml and  
schema.xml. basically the old default ones, without any particular reason.  
Don't do that !

>

> >  
> Now Solr was complaining about "Error loading class  
> 'solr.admin.AdminHandlers'". So I found the reference to  
> solr.admin.AdminHandlers in solrconfig.xml for each of my cores and  
> commented it out. Only then did Solr work again.  
>

>

> Seems to be you didn't take care of reading the update release notes, did  
you ?

>

>  
Cheers  
\--  
\--

>

> Benedetti Alessandro  
Visiting card : [http://about.me/alessandro_benedetti](http://about.me/alessan
dro_benedetti&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

>

> "Tyger, tyger burning bright  
In the forests of the night,  
What immortal hand or eye  
Could frame thy fearful symmetry?"

>

> William Blake - Songs of Experience -1794 England



Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Alessandro Benedetti
First of all I second Bram, I am sorry you had a bad experience with Solr,
but I think that:
-  without a minimum study and documentation
- without trying to follow the best practices
I think you are going to have a "miserable" experience with any software,
don't you ?

In addition to Bram :

On Mon, Sep 12, 2016 at 10:48 PM, Aaron Greenspan <
aaron.greens...@plainsite.org> wrote:
>
>  It didn’t say which field type. Buried in the logs I found a reference in
> the Java stack trace—which *disappears* (and distorts the viewing window
> horribly) after a few seconds when you try to view it in the web log UI—to
> the string "units="degrees"".
>

This si a bug, and it is really annoying, not sure anyone already raised
it, if not I suggest you to do that :)
But you can use the logs themselves without any problem.

>
> Apparently there is some aspect of the Thai text field type that Solr
> 6.2.0 doesn’t like. So I disabled it. I don’t use Thai text.
>

If you were not using the Thai text, why had you the Thai Text field type
defined ?
Keep It Simple Stupid is the way :)
I find tons of Solr instances in production mith monster solrconfig.xml and
schema.xml. basically the old default ones, without any particular reason.
Don't do that !

>
> Now Solr was complaining about "Error loading class
> 'solr.admin.AdminHandlers'". So I found the reference to
> solr.admin.AdminHandlers in solrconfig.xml for each of my cores and
> commented it out. Only then did Solr work again.
>

Seems to be you didn't take care of reading the update release notes, did
you ?


Cheers
-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Bram Van Dam
I'm sorry you're having a "miserable" experience "again". That's
certainly not my experience with Solr. That being said:

> First I was getting errors about "Unsupported major.minor version 52.0", so I 
> needed to install the Linux x64 JRE 1.8.0, which I managed on CentOS 6 with...
> yum install openjdk-1.8.0

This is not a Solr problem. Solr requires Java 8. Java 7 has been
officially end-of-lifed since april 2015. This means no more patches, no
more performance improvements and no more security updates (unless
you're paying Oracle). This is clearly stated in the (very decent) Solr
documentation. To use your own words: Java 7 is an antiquated nightmare
and the rest of the world has moved on to Java 8.

> So far so good. But I didn’t have JAVA_HOME set properly apparently, so I 
> needed to do the not-exactly-intuitive…
> export 
> JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/

You don't need to set JAVA_HOME to run Solr. But if you do have a
JAVA_HOME environment variable, and it points to a wrong Java version,
you're going to have a bad time.

> Then after stopping the old process (with kill -9, since there seems to be no 
> graceful way to shut down Solr)

There's a stop command, which is documented. It's a non-surprising
location and has a non-surprising name. And even if there wasn't, "kill"
would have sufficed.

> There was some kind of problem with StopFilterFactory and the text_general 
> field type. Thanks to Stack Overflow I was able to determine that the 
> apparent problem was that there was a parameter, previously fine, which was 
> no longer fine. So I removed all instances of 
> enablePositionIncrements="true". That helped, but then I ran into a broader 
> error: "Plugin Initializing failure for [schema.xml] fieldType". It didn’t 
> say which field type. Buried in the logs I found a reference in the Java 
> stack trace—which *disappears* (and distorts the viewing window horribly) 
> after a few seconds when you try to view it in the web log UI—to the string 
> "units="degrees"". Sure enough, this string appeared in my schema.xml for a 
> class called "solr.SpatialRecursivePrefixTreeFieldType" that I’m pretty sure 
> I never use. I removed that parameter, and moved on to the next set of errors.

Releases come with release notes and -- when required -- upgrade
instructions and admonitions. It's certainly possible that there's been
an oversight here or there and you're more than welcome to point those out.

> The user interface is still as buggy as an early alpha of most
products, the errors are difficult to understand when they don’t
actually specify what’s wrong (and they almost never do), and there
should have been an automatic process to highlight and fix problems in
old (pre-6) configuration files.

What user interface? Are you talking about the Admin UI? That's a
convenience feature which helps you manage Solr. It makes life a lot
easier, even if it's not perfect. The logs are generally quite good at
explaining what's wrong.

> Never mind the fact that the XML-based configuration process is an
antiquated nightmare when the rest of the world has long since moved
onto databases.

An antiquated nightmare? The rest of the world? How would this work?
What benefit would it possibly have?

You're more than welcome to report any bugs you find
(https://issues.apache.org/jira/browse/SOLR). But I feel like general
ranting on the mailing list isn't very productive. Well, I suppose
venting feels good, so there's that.

Things that would be more productive:

1. Reading the documentation.
2. Taking a basic system administration class or two.
3. Pointing out -- or contributing to -- parts of the documentation that
aren't up to par. Either on the mailing list, or on Jira. Preferably in
a constructive way instead of a "miserable experience"-way.

I feel like you're missing the part where most open source development,
documentation, release management etc is done by volunteers. Volunteers
who tend to scratch their own itch first, and are then kind enough to
donate the fruit of their labour to the rest of the world. You can
certainly make requests, and you can certainly hope for things to improve.

If you're having a "miserable" time "again", then you can always hire a
Solr consultant to do the work for you. You can't demand free stuff to
scratch your every itch. You can either invest your time and figure out
how to do things yourself, or your money and have things done for you.
But there's no such thing as a free lunch.

 - Bram