Re: Accessing Solr collections at different ports - Need help

2019-05-02 Thread Salmaan Rashid Syed
Thanks Jorn for your reply.

I say that the nodes are limited to 4 because when I launch Solr in cloud
mode, the first prompt that I get is to choose number of nodes [1-4]. When
I tried to enter 7, it says that they are more than 4 and choose a smaller
number.


*Thanks and Regards,*
Salmaan Rashid Syed
+91 8978353445 | www.panna.ai |
5550 Granite Pkwy, Suite #225, Plano TX-75024.
Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.



On Fri, May 3, 2019 at 12:05 PM Jörn Franke  wrote:

> BTW why do you think that SolrCloud is limited to 4 nodes? More are for
> sure possible.
>
> > Am 03.05.2019 um 07:54 schrieb Salmaan Rashid Syed <
> salmaan.ras...@mroads.com>:
> >
> > Hi Solr Users,
> >
> > I am using Solr 7.6 in cloud mode with external zookeeper installed at
> > ports 2181, 2182, 2183. Currently we have only one server allocated for
> > Solr. We are planning to move to multiple servers for better sharing,
> > replication etc in near future.
> >
> > Now the issue is that, our organisation has data indexed for different
> > clients as separate collections. We want to uniquely access, update and
> > index each collection separately so that each individual client has
> access
> > to their respective collections at their respective ports. Eg:—
> Collection1
> > at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
> >
> > I have two options I guess, one is to run Solr in cloud mode with 4 nodes
> > (max as limited by Solr) at 4 different ports. I don’t know how to go
> > beyond 4 nodes/ports in this case.
> >
> > The other option is to run Solr as service and create multiple copies of
> > Solr folder within the Server folder and access each Solr at different
> port
> > with its own collection as shown by
> > https://www.youtube.com/watch?v=wmQFwK2sujE
> >
> > I am really confused as to which is the better path to choose. Please
> help
> > me out.
> >
> > Thanks.
> >
> > Regards,
> > Salmaan
> >
> >
> > *Thanks and Regards,*
> > Salmaan Rashid Syed
> > +91 8978353445 | www.panna.ai |
> > 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> > Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.
>


Re: Accessing Solr collections at different ports - Need help

2019-05-02 Thread Salmaan Rashid Syed
Thanks Walter,

Since I am new to Solr and by looking at your suggestion, it looks like I
am trying to do something very complicated and out-of-box capabilities of
Solr. I really don't want to do that.

I am not from Computer Science background and my specialisation is in
Analytics and AI.

Let me put my case scenario briefly.

We have developed a customised Solr-search engine that can search for data
(prepared, cleaned and preprocessed by us) in each individual Solr
collection.

Every client of ours is from a different vertical (like health,
engineering, public services, finance, casual works etc). They search for
data in their respective Solr collection. They also add, update and
re-index their respective data periodically.

As suggested by you, if I port-out all the collections from a single port,
will not the latency increase, wil not the burden on a single server
increase, will not the computational speed slows down as all the clients
are trying to speak to the same port simultaneously.

Or do you think that Solr-as-service is better option, where I can create
multiple Solr instances at different ports with collections of individual
clients in each solr instance.

To be honest, I really don't know what Solr-as-service is really trying to
accomplish.

Apologies for lengthy question and Thanks in advance.


*Thanks and Regards,*
Salmaan Rashid Syed
+91 8978353445 | www.panna.ai |
5550 Granite Pkwy, Suite #225, Plano TX-75024.
Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.



On Fri, May 3, 2019 at 11:59 AM Walter Underwood 
wrote:

> The best option is to run all the collections at the same port.
> Intra-cluster communication cannot be split over multiple ports, so this
> would require big internal changes to Solr. And what about communication
> that does not belong to a collection, like electing an overseer node?
>
> Why do you want the very non-standard configuration?
>
> If you must have it, run a webserver like nginx on each node, configure it
> to do this crazy multiple port thing for external traffic and to forward
> all traffic to Solr’s single port.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 3, 2019, at 7:54 AM, Salmaan Rashid Syed <
> salmaan.ras...@mroads.com> wrote:
> >
> > Hi Solr Users,
> >
> > I am using Solr 7.6 in cloud mode with external zookeeper installed at
> > ports 2181, 2182, 2183. Currently we have only one server allocated for
> > Solr. We are planning to move to multiple servers for better sharing,
> > replication etc in near future.
> >
> > Now the issue is that, our organisation has data indexed for different
> > clients as separate collections. We want to uniquely access, update and
> > index each collection separately so that each individual client has
> access
> > to their respective collections at their respective ports. Eg:—
> Collection1
> > at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
> >
> > I have two options I guess, one is to run Solr in cloud mode with 4 nodes
> > (max as limited by Solr) at 4 different ports. I don’t know how to go
> > beyond 4 nodes/ports in this case.
> >
> > The other option is to run Solr as service and create multiple copies of
> > Solr folder within the Server folder and access each Solr at different
> port
> > with its own collection as shown by
> > https://www.youtube.com/watch?v=wmQFwK2sujE
> >
> > I am really confused as to which is the better path to choose. Please
> help
> > me out.
> >
> > Thanks.
> >
> > Regards,
> > Salmaan
> >
> >
> > *Thanks and Regards,*
> > Salmaan Rashid Syed
> > +91 8978353445 | www.panna.ai |
> > 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> > Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.
>
>


Re: Accessing Solr collections at different ports - Need help

2019-05-02 Thread Jörn Franke
BTW why do you think that SolrCloud is limited to 4 nodes? More are for sure 
possible.

> Am 03.05.2019 um 07:54 schrieb Salmaan Rashid Syed 
> :
> 
> Hi Solr Users,
> 
> I am using Solr 7.6 in cloud mode with external zookeeper installed at
> ports 2181, 2182, 2183. Currently we have only one server allocated for
> Solr. We are planning to move to multiple servers for better sharing,
> replication etc in near future.
> 
> Now the issue is that, our organisation has data indexed for different
> clients as separate collections. We want to uniquely access, update and
> index each collection separately so that each individual client has access
> to their respective collections at their respective ports. Eg:— Collection1
> at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
> 
> I have two options I guess, one is to run Solr in cloud mode with 4 nodes
> (max as limited by Solr) at 4 different ports. I don’t know how to go
> beyond 4 nodes/ports in this case.
> 
> The other option is to run Solr as service and create multiple copies of
> Solr folder within the Server folder and access each Solr at different port
> with its own collection as shown by
> https://www.youtube.com/watch?v=wmQFwK2sujE
> 
> I am really confused as to which is the better path to choose. Please help
> me out.
> 
> Thanks.
> 
> Regards,
> Salmaan
> 
> 
> *Thanks and Regards,*
> Salmaan Rashid Syed
> +91 8978353445 | www.panna.ai |
> 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.


Re: Accessing Solr collections at different ports - Need help

2019-05-02 Thread Jörn Franke
You can have dedicarse clusters per Client and/or you can protect it via 
Kerberos or Basic Auth or write your own authorization plugin based on OAuth.

I am not sure why you want to offer this on different ports to different 
clients.

> Am 03.05.2019 um 07:54 schrieb Salmaan Rashid Syed 
> :
> 
> Hi Solr Users,
> 
> I am using Solr 7.6 in cloud mode with external zookeeper installed at
> ports 2181, 2182, 2183. Currently we have only one server allocated for
> Solr. We are planning to move to multiple servers for better sharing,
> replication etc in near future.
> 
> Now the issue is that, our organisation has data indexed for different
> clients as separate collections. We want to uniquely access, update and
> index each collection separately so that each individual client has access
> to their respective collections at their respective ports. Eg:— Collection1
> at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
> 
> I have two options I guess, one is to run Solr in cloud mode with 4 nodes
> (max as limited by Solr) at 4 different ports. I don’t know how to go
> beyond 4 nodes/ports in this case.
> 
> The other option is to run Solr as service and create multiple copies of
> Solr folder within the Server folder and access each Solr at different port
> with its own collection as shown by
> https://www.youtube.com/watch?v=wmQFwK2sujE
> 
> I am really confused as to which is the better path to choose. Please help
> me out.
> 
> Thanks.
> 
> Regards,
> Salmaan
> 
> 
> *Thanks and Regards,*
> Salmaan Rashid Syed
> +91 8978353445 | www.panna.ai |
> 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.


Re: Accessing Solr collections at different ports - Need help

2019-05-02 Thread Walter Underwood
The best option is to run all the collections at the same port. Intra-cluster 
communication cannot be split over multiple ports, so this would require big 
internal changes to Solr. And what about communication that does not belong to 
a collection, like electing an overseer node?

Why do you want the very non-standard configuration?

If you must have it, run a webserver like nginx on each node, configure it to 
do this crazy multiple port thing for external traffic and to forward all 
traffic to Solr’s single port.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 3, 2019, at 7:54 AM, Salmaan Rashid Syed  
> wrote:
> 
> Hi Solr Users,
> 
> I am using Solr 7.6 in cloud mode with external zookeeper installed at
> ports 2181, 2182, 2183. Currently we have only one server allocated for
> Solr. We are planning to move to multiple servers for better sharing,
> replication etc in near future.
> 
> Now the issue is that, our organisation has data indexed for different
> clients as separate collections. We want to uniquely access, update and
> index each collection separately so that each individual client has access
> to their respective collections at their respective ports. Eg:— Collection1
> at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
> 
> I have two options I guess, one is to run Solr in cloud mode with 4 nodes
> (max as limited by Solr) at 4 different ports. I don’t know how to go
> beyond 4 nodes/ports in this case.
> 
> The other option is to run Solr as service and create multiple copies of
> Solr folder within the Server folder and access each Solr at different port
> with its own collection as shown by
> https://www.youtube.com/watch?v=wmQFwK2sujE
> 
> I am really confused as to which is the better path to choose. Please help
> me out.
> 
> Thanks.
> 
> Regards,
> Salmaan
> 
> 
> *Thanks and Regards,*
> Salmaan Rashid Syed
> +91 8978353445 | www.panna.ai |
> 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.



Accessing Solr collections at different ports

2019-05-02 Thread Salmaan Rashid Syed
Hi Solr Users,

I am using Solr 7.6 in cloud mode with external zookeeper installed at ports 
2181, 2182, 2183. Currently we have only one server allocated for Solr. We are 
planning to move to multiple servers for better sharing, replication etc in 
near future.

Now the issue is that, our organisation has data indexed for different clients 
as separate collections. We want to uniquely access, update and index each 
collection separately so that each individual client has access to their 
respective collections at their respective ports. Eg:— Collection1 at port 
8983, Collection2 at port 8984, Collection3 at port 8985 etc.

I have two options I guess, one is to run Solr in cloud mode with 4 nodes (max 
as limited by Solr) at 4 different ports. I don’t know how to go beyond 4 
nodes/ports in this case.

The other option is to run Solr as service and create multiple copies of Solr 
folder within the Server folder and access each Solr at different port with its 
own collection as shown by https://www.youtube.com/watch?v=wmQFwK2sujE 
 

I am really confused as to which is the better path to choose. Please help me 
out.

Thanks.

Regards,
Salmaan

Accessing Solr collections at different ports - Need help

2019-05-02 Thread Salmaan Rashid Syed
Hi Solr Users,

I am using Solr 7.6 in cloud mode with external zookeeper installed at
ports 2181, 2182, 2183. Currently we have only one server allocated for
Solr. We are planning to move to multiple servers for better sharing,
replication etc in near future.

Now the issue is that, our organisation has data indexed for different
clients as separate collections. We want to uniquely access, update and
index each collection separately so that each individual client has access
to their respective collections at their respective ports. Eg:— Collection1
at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.

I have two options I guess, one is to run Solr in cloud mode with 4 nodes
(max as limited by Solr) at 4 different ports. I don’t know how to go
beyond 4 nodes/ports in this case.

The other option is to run Solr as service and create multiple copies of
Solr folder within the Server folder and access each Solr at different port
with its own collection as shown by
https://www.youtube.com/watch?v=wmQFwK2sujE

I am really confused as to which is the better path to choose. Please help
me out.

Thanks.

Regards,
Salmaan


*Thanks and Regards,*
Salmaan Rashid Syed
+91 8978353445 | www.panna.ai |
5550 Granite Pkwy, Suite #225, Plano TX-75024.
Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.


Re: Reverse-engineering existing installation

2019-05-02 Thread David Smiley
Consider trying to diff configs from a default at the version it was copied
from, if possible. Even better, the configs should be in source control and
then you can browse history with commentary and sometimes links to issue
trackers and code reviews.

Also a big part that you can’t see by staring at configs is what the
queries look like. You should examine the system interacting with Solr to
observe embedded comments/docs for insights.

On Thu, May 2, 2019 at 11:21 PM Doug Reeder  wrote:

> The documentation for SOLR is good.  However it is oriented toward setting
> up a new installation, with the data model known.
>
> I have inherited an existing installation.  Aspects of the data model I
> know, but there's a lot of ways things could have been configured in SOLR,
> and for some cases, I don't know what SOLR was supposed to do.
>
> Can you reccomend any documentation on working out the configuration of an
> existing installation?
>
-- 
Sent from Gmail Mobile


Re: Reverse-engineering existing installation

2019-05-02 Thread Alexandre Rafalovitch
My presentation from 2016 may be interesting as I deconstruct a Solr
example, including the tips/commands on how to do so:
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016

The commands start around the slide 20.

Hope this helps,
Alex.
P.s. If this (and other's ideas) is not enough, make sure to mention
the Solr version when you come back for additional questions. It may
help to know which files to recommend checking for additional hints.

On Thu, 2 May 2019 at 23:21, Doug Reeder  wrote:
>
> The documentation for SOLR is good.  However it is oriented toward setting
> up a new installation, with the data model known.
>
> I have inherited an existing installation.  Aspects of the data model I
> know, but there's a lot of ways things could have been configured in SOLR,
> and for some cases, I don't know what SOLR was supposed to do.
>
> Can you reccomend any documentation on working out the configuration of an
> existing installation?


Reverse-engineering existing installation

2019-05-02 Thread Doug Reeder
The documentation for SOLR is good.  However it is oriented toward setting
up a new installation, with the data model known.

I have inherited an existing installation.  Aspects of the data model I
know, but there's a lot of ways things could have been configured in SOLR,
and for some cases, I don't know what SOLR was supposed to do.

Can you reccomend any documentation on working out the configuration of an
existing installation?


Re: Why did Solr stats min/max values were returned as float number for field of type="pint"?

2019-05-02 Thread Joel Bernstein
This syntax is bringing back correct data types:

rows=0&version=2&q=stock_s:10&stats=true&NOW=1556849474583&isShard=true&wt=javabin&stats.field={!max=true
}id_i&stats.field={!max=true }response_d

This is the query that the stats Stream Expressions writes under the
covers. The Streaming Expression looks like this:

stats(testapp, q="stock_s:10", max(id_i), max(response_d))











Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 2, 2019 at 8:47 PM Wendy2  wrote:

> Hi Solr users,
>
> I have a pint field:
>  indexed="true" stored="true"/>
>
> But Solr stats min/max values were returned as float numbers ( "min":0.0,
> "max":1356.0) . I thought "pint" type fields should return min/max as int.
> Is there something that user can do to make sure it returns as int type
> (which matches the field definition)?   Thanks!
>
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":17,
> "params":{
>   "q":"*:*",
>   "stats":"true",
>   "fl":"",
>   "rows":"0",
>   "stats.field":"rcsb_entry_info.disulfide_bond_count"}},
>   "response":{"numFound":151364,"start":0,"docs":[]
>   },
>   "stats":{
> "stats_fields":{
>   "rcsb_entry_info.disulfide_bond_count":{
> "min":0.0,
> "max":1356.0,
> "count":151363,
> "missing":1,
> "sum":208560.0,
> "sumOfSquares":5660388.0,
> "mean":1.3778796667613618,
> "stddev":5.958002695748158
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Why did Solr stats min/max values were returned as float number for field of type="pint"?

2019-05-02 Thread Wendy2
Hi Solr users,

I have a pint field: 


But Solr stats min/max values were returned as float numbers ( "min":0.0, 
"max":1356.0) . I thought "pint" type fields should return min/max as int.
Is there something that user can do to make sure it returns as int type
(which matches the field definition)?   Thanks!


{
  "responseHeader":{
"status":0,
"QTime":17,
"params":{
  "q":"*:*",
  "stats":"true",
  "fl":"",
  "rows":"0",
  "stats.field":"rcsb_entry_info.disulfide_bond_count"}},
  "response":{"numFound":151364,"start":0,"docs":[]
  },
  "stats":{
"stats_fields":{
  "rcsb_entry_info.disulfide_bond_count":{
"min":0.0,
"max":1356.0,
"count":151363,
"missing":1,
"sum":208560.0,
"sumOfSquares":5660388.0,
"mean":1.3778796667613618,
"stddev":5.958002695748158



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrPlugin update existing documents in newSearcher()

2019-05-02 Thread Jan Høydahl
I think you cannot do that. The callback is sent AFTER a searcher is opened on 
the segment, so the index is already there.
Normally you re-index from source if you need changes in schema or processing.
If that is not possible, you must first check if ALL your fields are stored or 
docValues, if not you cannot hope to re-index from the content in the index in 
a lossless way. If you have everything stored, I'd create a new collection and 
write a script that reads all docs using cursorMark and indexes them into the 
new collection.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. mai 2019 kl. 19:39 skrev Maria Muslea :
> 
> Yes, I will want to also do that, but initially I need to modify the docs
> that are already in SOLR, and I thought of doing that at startup.
> I am able to get the documents that I would like to modify, but the
> operations for modifying the documents don't seem to be doing anything.
> 
> Do you see anything wrong with the way I am trying to modify the documents?
> 
> Thank you for your help,
> Maria
> 
> On Thu, May 2, 2019 at 3:34 AM Jan Høydahl  wrote:
> 
>> Hi
>> 
>> I don't see your requirement clearly. Sounds like what you really need is
>> an UpdateRequestProcessor where you CAN intercept docs being added and
>> modify them as you wish.
>> https://lucene.apache.org/solr/guide/7_7/update-request-processors.html
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 1. mai 2019 kl. 22:31 skrev Maria Muslea :
>>> 
>>> Hi,
>>> 
>>> I have a plugin that extends the AbstractSolrEventListener. I override
>> the
>>> newSearcher() method and the plan is to add some extra functionality,
>>> namely updating existing documents by setting new values for existing
>>> fields as well as adding new fields to the documents.
>>> 
>>> I can see that the plugin is invoked and I can get the list of documents,
>>> but I cannot update existing fields or add new fields. I have tried
>> various
>>> approaches, but I cannot get it to work.
>>> 
>>> If you have any suggestions I would really appreciate it. The code that I
>>> am currently trying is below.
>>> 
>>> Thank you,
>>> Maria
>>> 
>>>for (DocIterator iter = docs.iterator(); iter.hasNext();) {
>>> 
>>>   int doci = iter.nextDoc();
>>> 
>>>   Document document = newSearcher.doc(doci);
>>> 
>>> 
>>> 
>>>   SolrInputDocument solrInputDocument1 = new SolrInputDocument();
>>> 
>>>   AddUpdateCommand addUpdateCommand1 = new AddUpdateCommand(req);
>>> 
>>>   addUpdateCommand1.clear();
>>> 
>>>   solrInputDocument1.setField("id", document.get("id"));
>>> 
>>>   solrInputDocument1.addField("newfield", "newvalue");
>>> 
>>>   solrInputDocument1.setField("existingfield", "value");
>>> 
>>>   addUpdateCommand1.solrDoc = solrInputDocument1;
>>> 
>>>   getCore().getUpdateHandler().addDoc(addUpdateCommand1);
>>> 
>>> 
>>>   SolrQueryResponse re = new SolrQueryResponse();
>>> 
>>>   SolrQueryRequest rq = new LocalSolrQueryRequest(getCore(), new
>>> ModifiableSolrParams());
>>> 
>>>   CommitUpdateCommand commit = new CommitUpdateCommand(rq,false);
>>> 
>>>getCore().getUpdateHandler().commit(commit);
>>> 
>>> 
>>>}
>> 
>> 



Re: SolrPlugin update existing documents in newSearcher()

2019-05-02 Thread Maria Muslea
Yes, I will want to also do that, but initially I need to modify the docs
that are already in SOLR, and I thought of doing that at startup.
I am able to get the documents that I would like to modify, but the
operations for modifying the documents don't seem to be doing anything.

Do you see anything wrong with the way I am trying to modify the documents?

Thank you for your help,
Maria

On Thu, May 2, 2019 at 3:34 AM Jan Høydahl  wrote:

> Hi
>
> I don't see your requirement clearly. Sounds like what you really need is
> an UpdateRequestProcessor where you CAN intercept docs being added and
> modify them as you wish.
> https://lucene.apache.org/solr/guide/7_7/update-request-processors.html
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 1. mai 2019 kl. 22:31 skrev Maria Muslea :
> >
> > Hi,
> >
> > I have a plugin that extends the AbstractSolrEventListener. I override
> the
> > newSearcher() method and the plan is to add some extra functionality,
> > namely updating existing documents by setting new values for existing
> > fields as well as adding new fields to the documents.
> >
> > I can see that the plugin is invoked and I can get the list of documents,
> > but I cannot update existing fields or add new fields. I have tried
> various
> > approaches, but I cannot get it to work.
> >
> > If you have any suggestions I would really appreciate it. The code that I
> > am currently trying is below.
> >
> > Thank you,
> > Maria
> >
> > for (DocIterator iter = docs.iterator(); iter.hasNext();) {
> >
> >int doci = iter.nextDoc();
> >
> >Document document = newSearcher.doc(doci);
> >
> >
> >
> >SolrInputDocument solrInputDocument1 = new SolrInputDocument();
> >
> >AddUpdateCommand addUpdateCommand1 = new AddUpdateCommand(req);
> >
> >addUpdateCommand1.clear();
> >
> >solrInputDocument1.setField("id", document.get("id"));
> >
> >solrInputDocument1.addField("newfield", "newvalue");
> >
> >solrInputDocument1.setField("existingfield", "value");
> >
> >addUpdateCommand1.solrDoc = solrInputDocument1;
> >
> >getCore().getUpdateHandler().addDoc(addUpdateCommand1);
> >
> >
> >SolrQueryResponse re = new SolrQueryResponse();
> >
> >SolrQueryRequest rq = new LocalSolrQueryRequest(getCore(), new
> > ModifiableSolrParams());
> >
> >CommitUpdateCommand commit = new CommitUpdateCommand(rq,false);
> >
> > getCore().getUpdateHandler().commit(commit);
> >
> >
> > }
>
>


Re: problem indexing GPS metadata for video upload

2019-05-02 Thread Tim Allison
Sorry build #182: https://builds.apache.org/job/tika-branch-1x/

On Thu, May 2, 2019 at 12:01 PM Tim Allison  wrote:
>
> I just pushed a fix for TIKA-2861.  If you can either build locally or
> wait a few hours for Jenkins to build #182, let me know if that works
> with straight tika-app.jar.
>
> On Thu, May 2, 2019 at 5:00 AM Where is Where  wrote:
> >
> > Thank you Alex and Tim.
> > I have looked at the solrconfig.xml file (I am trying the techproducts demo
> > config), the only related place I can find is the extract handle
> >
> >  >   startup="lazy"
> >   class="solr.extraction.ExtractingRequestHandler" >
> > 
> >   true
> >   
> >
> >   
> >   true
> >   links
> >   ignored_
> > 
> >   
> >
> > I am using this command bin/post -c techproducts example/exampledocs/1.mp4
> > -params "literal.id=mp4_1&uprefix=attr_"
> >
> > I have tried commenting out ignored_ and changing
> > to div
> > but still not working. I don't quite get why image is getting gps etc
> > metadata but video is acting differently while it is using the same
> > solrconfig and the gps metadata are in the same fields. There is no
> > differentiation in solrconfig setting between image and video.
> >
> > Tim yes this is related to the TIKA link. Thank you!
> >
> > Here is the output in solr for mp4.
> >
> > {
> > "attr_meta":["stream_size",
> >   "5721559",
> >   "date",
> >   "2019-03-29T04:36:39Z",
> >   "X-Parsed-By",
> >   "org.apache.tika.parser.DefaultParser",
> >   "X-Parsed-By",
> >   "org.apache.tika.parser.mp4.MP4Parser",
> >   "stream_content_type",
> >   "application/octet-stream",
> >   "meta:creation-date",
> >   "2019-03-29T04:36:39Z",
> >   "Creation-Date",
> >   "2019-03-29T04:36:39Z",
> >   "tiff:ImageLength",
> >   "1080",
> >   "resourceName",
> >   "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >   "dcterms:created",
> >   "2019-03-29T04:36:39Z",
> >   "dcterms:modified",
> >   "2019-03-29T04:36:39Z",
> >   "Last-Modified",
> >   "2019-03-29T04:36:39Z",
> >   "Last-Save-Date",
> >   "2019-03-29T04:36:39Z",
> >   "xmpDM:audioSampleRate",
> >   "1000",
> >   "meta:save-date",
> >   "2019-03-29T04:36:39Z",
> >   "modified",
> >   "2019-03-29T04:36:39Z",
> >   "tiff:ImageWidth",
> >   "1920",
> >   "xmpDM:duration",
> >   "2.64",
> >   "Content-Type",
> >   "video/mp4"],
> > "id":"mp4_4",
> > "attr_stream_size":["5721559"],
> > "attr_date":["2019-03-29T04:36:39Z"],
> > "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
> >   "org.apache.tika.parser.mp4.MP4Parser"],
> > "attr_stream_content_type":["application/octet-stream"],
> > "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> > "attr_creation_date":["2019-03-29T04:36:39Z"],
> > "attr_tiff_imagelength":["1080"],
> > 
> > "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> > "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> > "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> > "last_modified":"2019-03-29T04:36:39Z",
> > "attr_last_save_date":["2019-03-29T04:36:39Z"],
> > "attr_xmpdm_audiosamplerate":["1000"],
> > "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> > "attr_modified":["2019-03-29T04:36:39Z"],
> > "attr_tiff_imagewidth":["1920"],
> > "attr_xmpdm_duration":["2.64"],
> > "content_type":["video/mp4"],
> > "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
> >  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> > "_version_":1632383499325407232}]
> >   }}
> >
> > JPEG is getting these:
> > "attr_meta":[
> > "GPS Latitude",
> >   "37° 47' 41.99\"",
> > 
> > "attr_gps_latitude":["37° 47' 41.99\""],
> >
> >
> > On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:
> >
> > > uploading video to solr via tika
> > > https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > > The index has no video GPS metadata which is extracted and indexed for
> > > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > > checked all have GPS Exif data embedded in the same fields as image. Any
> > > idea? Thanks!
> > >


Re: problem indexing GPS metadata for video upload

2019-05-02 Thread Tim Allison
I just pushed a fix for TIKA-2861.  If you can either build locally or
wait a few hours for Jenkins to build #182, let me know if that works
with straight tika-app.jar.

On Thu, May 2, 2019 at 5:00 AM Where is Where  wrote:
>
> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts demo
> config), the only related place I can find is the extract handle
>
>startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   
>
>   
>   true
>   links
>   ignored_
> 
>   
>
> I am using this command bin/post -c techproducts example/exampledocs/1.mp4
> -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out ignored_ and changing
> to div
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
> "attr_meta":["stream_size",
>   "5721559",
>   "date",
>   "2019-03-29T04:36:39Z",
>   "X-Parsed-By",
>   "org.apache.tika.parser.DefaultParser",
>   "X-Parsed-By",
>   "org.apache.tika.parser.mp4.MP4Parser",
>   "stream_content_type",
>   "application/octet-stream",
>   "meta:creation-date",
>   "2019-03-29T04:36:39Z",
>   "Creation-Date",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageLength",
>   "1080",
>   "resourceName",
>   "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>   "dcterms:created",
>   "2019-03-29T04:36:39Z",
>   "dcterms:modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Save-Date",
>   "2019-03-29T04:36:39Z",
>   "xmpDM:audioSampleRate",
>   "1000",
>   "meta:save-date",
>   "2019-03-29T04:36:39Z",
>   "modified",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageWidth",
>   "1920",
>   "xmpDM:duration",
>   "2.64",
>   "Content-Type",
>   "video/mp4"],
> "id":"mp4_4",
> "attr_stream_size":["5721559"],
> "attr_date":["2019-03-29T04:36:39Z"],
> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.mp4.MP4Parser"],
> "attr_stream_content_type":["application/octet-stream"],
> "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> "attr_creation_date":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagelength":["1080"],
> 
> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> "last_modified":"2019-03-29T04:36:39Z",
> "attr_last_save_date":["2019-03-29T04:36:39Z"],
> "attr_xmpdm_audiosamplerate":["1000"],
> "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> "attr_modified":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagewidth":["1920"],
> "attr_xmpdm_duration":["2.64"],
> "content_type":["video/mp4"],
> "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
>  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[
> "GPS Latitude",
>   "37° 47' 41.99\"",
> 
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:
>
> > uploading video to solr via tika
> > https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > The index has no video GPS metadata which is extracted and indexed for
> > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > checked all have GPS Exif data embedded in the same fields as image. Any
> > idea? Thanks!
> >


Re: Potential authorization bug when making HTTP requests

2019-05-02 Thread Jan Høydahl
Fixed in 6.6.6 and 7.7, please upgrade
https://lucene.apache.org/solr/news.html 

https://issues.apache.org/jira/browse/SOLR-12514 
 

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. mai 2019 kl. 14:03 skrev adfel70 :
> 
> Atuhorization bug (?) when making HTTP requests
> 
> We are experiencing a problem when making HTTP requests to a cluster with
> authorization plugin enabled.
> Permissions are configured in security.json the following:
> 
> {
> ... authentication_settings ...
>  "authorization":{
>  "class":"solr.RuleBasedAuthorizationPlugin",  
>  "permissions":[
>   {
> "name": "read",
>  "role": "*"
>},
>   {
> "name": "update",
>  "role": ["indexer", "admin"]
>},
>{
> "name": "all",
>  "role": "admin"
>}
>  ],
>  "user-role": {
>   "admin_user": "admin",
>   "indexer_app": "indexer"
>  }
> }
> 
> Our goal is to give all users read-only access to the data, read+write
> permissions to indexer_app user and full permissions to admin_user.
> 
> While testing the ACLs with different users we encountered unclear results,
> where in some cases a privileged user got HTTP 403 response (unauthorized
> request):
> - in some calls authenticated reader could query the data.
> - in some calls 'indexer_app' user could query data nor update the data.
> - 'admin_user' worked as expected.
> In addition, the problems were only relevant to HTTP requests - SolrJ
> requests were perfect...
> 
> After further investigation we realized what seems to be the problem: HTTP
> requests works correctly only when the collection has a core on the server
> that got the initial request. For example:
> Say we have a cloud made of 2 servers - 'host1' and 'host2' and collection
> 'test' with one shard - core on host1:
> 
> curl -u *reader_user*: "http://host1:8983/solr/test/select?q=*:*"; 
>   
> --> code 200 as expected
> curl -u *reader_user*: "http://host2:8983/solr/test/select?q=*:*"; 
>   
> --> code 403 (error - should return result)
> 
> curl -u *indexer_app*: "http://host1:8983/solr/test/select?q=*:*"; 
>   
> --> code 200 as expected
> curl -u *indexer_app*: "http://host1:8983/solr/test/update?commit=true"; 
> --> code 200 as expected
> curl -u *indexer_app*: "http://host2:8983/solr/test/select?q=*:*"; 
>   
> --> code 403 (error - should return result)
> curl -u *indexer_app*: "http://host2:8983/solr/test/update?commit=true"; 
> --> code 403 (error - should return result)
> 
> We guess 'admin_user' does not encounter any error due to the special
> /'all'/ permission.
> Tested both using basic and Kerberos authentication plugins and got the same
> behaviour.
> Is this should be opened as a bug?
> Thanks.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Status of solR / HDFS-v3 compatibility

2019-05-02 Thread Nicolas Paris
Thanks kevin, and also thanks for your blog on hdfs topic

> https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html

On Thu, May 02, 2019 at 09:37:59AM -0400, Kevin Risden wrote:
> For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency.
> Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some
> testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3,
> but the libraries are Hadoop 3.
> 
> The reference guide for 8.0+ hasn't been released yet, but also don't think
> it was updated.
> 
> Kevin Risden
> 
> 
> On Thu, May 2, 2019 at 9:32 AM Nicolas Paris 
> wrote:
> 
> > Hi
> >
> > solr doc [1] says it's only compatible with hdfs 2.x
> > is that true ?
> >
> >
> > [1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html
> >
> > --
> > nicolas
> >

-- 
nicolas


Re: Status of solR / HDFS-v3 compatibility

2019-05-02 Thread Kevin Risden
For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency.
Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some
testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3,
but the libraries are Hadoop 3.

The reference guide for 8.0+ hasn't been released yet, but also don't think
it was updated.

Kevin Risden


On Thu, May 2, 2019 at 9:32 AM Nicolas Paris 
wrote:

> Hi
>
> solr doc [1] says it's only compatible with hdfs 2.x
> is that true ?
>
>
> [1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html
>
> --
> nicolas
>


Status of solR / HDFS-v3 compatibility

2019-05-02 Thread Nicolas Paris
Hi

solr doc [1] says it's only compatible with hdfs 2.x
is that true ?


[1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html

-- 
nicolas


Re: Pagination with streaming expressions

2019-05-02 Thread Joel Bernstein
There is an open ticket which deals with this:

https://issues.apache.org/jira/browse/SOLR-12209

I've been very focused though on anything that enhances the Solr Math
Expressions or has been needed for the Fusion SQL engine, which is what I
work on at LucidWorks. SOLR-12209 doesn't fall into that category.
Eventually though I will clear that ticket if someone else doesn't resolve
it first.




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 1, 2019 at 7:56 PM Erick Erickson 
wrote:

> This sounds like an XY problem. You’re asking now to paginate, but not
> explaining the problem you want to solve with paginating.
>
> I don’t immediately see what purpose paginating serves here. What
> significance does a page have to do with the gatherNodes? How use would the
> _user_ have with these results? Especially for two unrelated queries. IOW
> if for query1 you count something for page 13, and for query2 you also
> count something for page 13 what information is the user getting in those
> two cases? Especially if the total result set for query1 is 1,000 docs but
> for query2 is 10,000,000 does?
>
> But in general no, streaming is orthogonal to most use-cases for
> pagination and isn’t really supported except if you read through the
> returns and throw away the first N pages, probably pretty inefficient.
>
> Erick
>
> > On May 1, 2019, at 1:28 PM, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > Is there a way to paginate the results of Streaming Expression?
> >
> > Let's say I have a simple gatherNodes function which has count operation
> at
> > the end of it. I can sort by the count fine but now I would like to be
> able
> > to select specific sub set of result based on pagination parameters. Is
> > there any way to do that?
> >
> > Thanks!
> > Pratik
>
>


JSON Facet Count All Information

2019-05-02 Thread Furkan KAMACI
Hi,

I have a multivalued field at which I store some metadata. I want to see
top 4 metadata at my documents and also total metadata count. I run that
query:

q=metadata:[*+TO+*]&rows=0&json.facet={top_tags:{type:terms,field:metadata,limit:4,mincount:1}}

However, how can I calculate total term count in a multivalued field beside
running a json facet on that?

Kind Regards,
Furkan KAMACI


Re: Determing Solr heap requirments and analyzing memory usage

2019-05-02 Thread Erick Erickson
Brian: 

Many thanks for letting us know what you found. I’ll attach this to SOLR-13003 
which is about this exact issue but doesn’t contain this information. This is a 
great help.

> On May 2, 2019, at 6:15 AM, Brian Ecker  wrote:
> 
> Just to update here in order to help others that might run into similar
> issues in the future, the problem is resolved. The issue was caused by the
> queryResultCache. This was very easy to determine by analyzing a heap dump.
> In our setup we had the following config:
> 
>  autowarmCount="0"/>
> 
> In reality this maxRamMB="3072" was not as expected, and this cache was
> using *way* more memory (about 6-8 times the amount). See the following
> screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
> the left window that ramBytes, the internal calculation of how much memory
> Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
> notice that the highlighted line, the ConcurrentLRUCache used internally by
> the FastLRUCache representing the queryResultCache, is actually using
> 12212779160B (12212MB). On further investigation, I realized that this
> cache is a map from a query with all its associated objects as the key, to
> a very simple object containing an array of document (integer) ids as the
> value.
> 
> Looking into the lucene-solr source, I found the following line for the
> calculation of ramBytesUsed
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
> Surprisingly, the query objects used as keys in the queryResultCache do not
> implement Accountable as far as I can tell, and this lines up very well
> with our observation of memory usage because in the heap dump we can also
> see that the keys in the cache are using substantially more memory than the
> values and completely account for the additional memory usage. It was quite
> surprising to me that the keys were given a default value of 192B as
> specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
> imagine a case where the keys in the queryResultCache would be so small. I
> imagine that in almost all cases the keys would actually be larger than the
> values for the queryResultCache, but that's probably not true for all
> usages of a FastLRUCache.
> 
> We solved our memory usage issue by drastically reducing the maxRamMB value
> and calculating the actual max usage as maxRamMB * 8. It would be quite
> useful to have this detail at least documented somewhere.
> 
> -Brian
> 
> On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey  wrote:
> 
>> On 4/23/2019 11:48 AM, Brian Ecker wrote:
>>> I see. The other files I meant to attach were the GC log (
>>> https://pastebin.com/raw/qeuQwsyd), the heap histogram (
>>> https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
>>> http://oi64.tinypic.com/21r0bk.jpg).
>> 
>> I have no idea what to do with the histogram.  I doubt it's all that
>> useful anyway, as it wouldn't have any information about what parts of
>> the system are using the most.
>> 
>> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>>  To get anything useful out of a GC log, it would probably need to
>> cover hours of runtime.
>> 
>> But if you are experiencing OutOfMemoryError, then either you have run
>> into something where a memory leak exists, or there's something about
>> your index or your queries that needs more heap than you have allocated.
>>  Memory leaks are not super common in Solr, but they have happened.
>> 
>> Tuning GC will never help OOME problems.
>> 
>> The screenshot looks like it matches the info below.
>> 
>>> I'll work on getting the heap dump, but would it also be sufficient to
>> use
>>> say a 5GB dump from when it's half full and then extrapolate to the
>>> contents of the heap when it's full? That way the dump would be a bit
>>> easier to work with.
>> 
>> That might be useful.  The only way to know for sure is to take a look
>> at it to see if the part of the code using lots of heap is detectable.
>> 
>>> There are around 2,100,000 documents.
>> 
>>> The data takes around 9GB on disk.
>> 
>> Ordinarily, I would expect that level of data to not need a whole lot of
>> heap.  10GB would be more than I would think necessary, but if your
>> queries are big consumers of memory, I could be wrong.  I ran indexes
>> with 30 million documents taking up 50GB of disk space on an 8GB heap.
>> I probably could have gone lower with no problems.
>> 
>> I have absolutely no idea what kind of requirements the spellcheck
>> feature has.  I've never used that beyond a few test queries.  If the
>> query information you sent is complete, I wouldn't expect the
>> non-spellcheck parts to require a whole lot of heap.  So perhaps
>> spellcheck is the culprit here.  Somebody else will need to comment on
>> that.
>> 
>> Thanks,
>> Shawn
>> 



Potential authorization bug when making HTTP requests

2019-05-02 Thread adfel70
Atuhorization bug (?) when making HTTP requests

We are experiencing a problem when making HTTP requests to a cluster with
authorization plugin enabled.
Permissions are configured in security.json the following:

{
 ... authentication_settings ...
  "authorization":{
  "class":"solr.RuleBasedAuthorizationPlugin",  
  "permissions":[
{
  "name": "read",
  "role": "*"
},
{
  "name": "update",
  "role": ["indexer", "admin"]
},
{
  "name": "all",
  "role": "admin"
}
  ],
  "user-role": {
"admin_user": "admin",
"indexer_app": "indexer"
  }
}

Our goal is to give all users read-only access to the data, read+write
permissions to indexer_app user and full permissions to admin_user.

While testing the ACLs with different users we encountered unclear results,
where in some cases a privileged user got HTTP 403 response (unauthorized
request):
- in some calls authenticated reader could query the data.
- in some calls 'indexer_app' user could query data nor update the data.
- 'admin_user' worked as expected.
In addition, the problems were only relevant to HTTP requests - SolrJ
requests were perfect...

After further investigation we realized what seems to be the problem: HTTP
requests works correctly only when the collection has a core on the server
that got the initial request. For example:
Say we have a cloud made of 2 servers - 'host1' and 'host2' and collection
'test' with one shard - core on host1:

curl -u *reader_user*: "http://host1:8983/solr/test/select?q=*:*";   

--> code 200 as expected
curl -u *reader_user*: "http://host2:8983/solr/test/select?q=*:*";   

--> code 403 (error - should return result)

curl -u *indexer_app*: "http://host1:8983/solr/test/select?q=*:*";   

--> code 200 as expected
curl -u *indexer_app*: "http://host1:8983/solr/test/update?commit=true"; 
--> code 200 as expected
curl -u *indexer_app*: "http://host2:8983/solr/test/select?q=*:*";   

--> code 403 (error - should return result)
curl -u *indexer_app*: "http://host2:8983/solr/test/update?commit=true"; 
--> code 403 (error - should return result)

We guess 'admin_user' does not encounter any error due to the special
/'all'/ permission.
Tested both using basic and Kerberos authentication plugins and got the same
behaviour.
Is this should be opened as a bug?
Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrPlugin update existing documents in newSearcher()

2019-05-02 Thread Jan Høydahl
Hi

I don't see your requirement clearly. Sounds like what you really need is an 
UpdateRequestProcessor where you CAN intercept docs being added and modify them 
as you wish. 
https://lucene.apache.org/solr/guide/7_7/update-request-processors.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 1. mai 2019 kl. 22:31 skrev Maria Muslea :
> 
> Hi,
> 
> I have a plugin that extends the AbstractSolrEventListener. I override the
> newSearcher() method and the plan is to add some extra functionality,
> namely updating existing documents by setting new values for existing
> fields as well as adding new fields to the documents.
> 
> I can see that the plugin is invoked and I can get the list of documents,
> but I cannot update existing fields or add new fields. I have tried various
> approaches, but I cannot get it to work.
> 
> If you have any suggestions I would really appreciate it. The code that I
> am currently trying is below.
> 
> Thank you,
> Maria
> 
> for (DocIterator iter = docs.iterator(); iter.hasNext();) {
> 
>int doci = iter.nextDoc();
> 
>Document document = newSearcher.doc(doci);
> 
> 
> 
>SolrInputDocument solrInputDocument1 = new SolrInputDocument();
> 
>AddUpdateCommand addUpdateCommand1 = new AddUpdateCommand(req);
> 
>addUpdateCommand1.clear();
> 
>solrInputDocument1.setField("id", document.get("id"));
> 
>solrInputDocument1.addField("newfield", "newvalue");
> 
>solrInputDocument1.setField("existingfield", "value");
> 
>addUpdateCommand1.solrDoc = solrInputDocument1;
> 
>getCore().getUpdateHandler().addDoc(addUpdateCommand1);
> 
> 
>SolrQueryResponse re = new SolrQueryResponse();
> 
>SolrQueryRequest rq = new LocalSolrQueryRequest(getCore(), new
> ModifiableSolrParams());
> 
>CommitUpdateCommand commit = new CommitUpdateCommand(rq,false);
> 
> getCore().getUpdateHandler().commit(commit);
> 
> 
> }



Re: Update Solr 7.7 Reference Guide graceTime -> graceDuration

2019-05-02 Thread Jan Høydahl
Thanks. Actually you may edit the page and submit a GitHub PullRequest directly 
if you wish at 
https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/solrcloud-autoscaling-triggers.adoc
 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. mai 2019 kl. 00:44 skrev bban954 :
> 
> When setting up a Scheduled Trigger for Solr autoscaling I was running into
> errors that graceTime was an undefined property. I found some discussion on
> patch changes that had updated this property name to graceDuration, which
> should probably be reflected in the reference guide at
> https://lucene.apache.org/solr/guide/7_7/solrcloud-autoscaling-triggers.html#scheduled-trigger.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Determing Solr heap requirments and analyzing memory usage

2019-05-02 Thread Brian Ecker
Just to update here in order to help others that might run into similar
issues in the future, the problem is resolved. The issue was caused by the
queryResultCache. This was very easy to determine by analyzing a heap dump.
In our setup we had the following config:



In reality this maxRamMB="3072" was not as expected, and this cache was
using *way* more memory (about 6-8 times the amount). See the following
screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
the left window that ramBytes, the internal calculation of how much memory
Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
notice that the highlighted line, the ConcurrentLRUCache used internally by
the FastLRUCache representing the queryResultCache, is actually using
12212779160B (12212MB). On further investigation, I realized that this
cache is a map from a query with all its associated objects as the key, to
a very simple object containing an array of document (integer) ids as the
value.

Looking into the lucene-solr source, I found the following line for the
calculation of ramBytesUsed
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
Surprisingly, the query objects used as keys in the queryResultCache do not
implement Accountable as far as I can tell, and this lines up very well
with our observation of memory usage because in the heap dump we can also
see that the keys in the cache are using substantially more memory than the
values and completely account for the additional memory usage. It was quite
surprising to me that the keys were given a default value of 192B as
specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
imagine a case where the keys in the queryResultCache would be so small. I
imagine that in almost all cases the keys would actually be larger than the
values for the queryResultCache, but that's probably not true for all
usages of a FastLRUCache.

We solved our memory usage issue by drastically reducing the maxRamMB value
and calculating the actual max usage as maxRamMB * 8. It would be quite
useful to have this detail at least documented somewhere.

-Brian

On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey  wrote:

> On 4/23/2019 11:48 AM, Brian Ecker wrote:
> > I see. The other files I meant to attach were the GC log (
> > https://pastebin.com/raw/qeuQwsyd), the heap histogram (
> > https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
> > http://oi64.tinypic.com/21r0bk.jpg).
>
> I have no idea what to do with the histogram.  I doubt it's all that
> useful anyway, as it wouldn't have any information about what parts of
> the system are using the most.
>
> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>   To get anything useful out of a GC log, it would probably need to
> cover hours of runtime.
>
> But if you are experiencing OutOfMemoryError, then either you have run
> into something where a memory leak exists, or there's something about
> your index or your queries that needs more heap than you have allocated.
>   Memory leaks are not super common in Solr, but they have happened.
>
> Tuning GC will never help OOME problems.
>
> The screenshot looks like it matches the info below.
>
> > I'll work on getting the heap dump, but would it also be sufficient to
> use
> > say a 5GB dump from when it's half full and then extrapolate to the
> > contents of the heap when it's full? That way the dump would be a bit
> > easier to work with.
>
> That might be useful.  The only way to know for sure is to take a look
> at it to see if the part of the code using lots of heap is detectable.
>
> > There are around 2,100,000 documents.
> 
> > The data takes around 9GB on disk.
>
> Ordinarily, I would expect that level of data to not need a whole lot of
> heap.  10GB would be more than I would think necessary, but if your
> queries are big consumers of memory, I could be wrong.  I ran indexes
> with 30 million documents taking up 50GB of disk space on an 8GB heap.
> I probably could have gone lower with no problems.
>
> I have absolutely no idea what kind of requirements the spellcheck
> feature has.  I've never used that beyond a few test queries.  If the
> query information you sent is complete, I wouldn't expect the
> non-spellcheck parts to require a whole lot of heap.  So perhaps
> spellcheck is the culprit here.  Somebody else will need to comment on
> that.
>
> Thanks,
> Shawn
>


Re: problem indexing GPS metadata for video upload

2019-05-02 Thread Where is Where
Thank you Alex and Tim.
I have looked at the solrconfig.xml file (I am trying the techproducts demo
config), the only related place I can find is the extract handle



  true
  

  
  true
  links
  ignored_

  

I am using this command bin/post -c techproducts example/exampledocs/1.mp4
-params "literal.id=mp4_1&uprefix=attr_"

I have tried commenting out ignored_ and changing
to div
but still not working. I don't quite get why image is getting gps etc
metadata but video is acting differently while it is using the same
solrconfig and the gps metadata are in the same fields. There is no
differentiation in solrconfig setting between image and video.

Tim yes this is related to the TIKA link. Thank you!

Here is the output in solr for mp4.

{
"attr_meta":["stream_size",
  "5721559",
  "date",
  "2019-03-29T04:36:39Z",
  "X-Parsed-By",
  "org.apache.tika.parser.DefaultParser",
  "X-Parsed-By",
  "org.apache.tika.parser.mp4.MP4Parser",
  "stream_content_type",
  "application/octet-stream",
  "meta:creation-date",
  "2019-03-29T04:36:39Z",
  "Creation-Date",
  "2019-03-29T04:36:39Z",
  "tiff:ImageLength",
  "1080",
  "resourceName",
  "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
  "dcterms:created",
  "2019-03-29T04:36:39Z",
  "dcterms:modified",
  "2019-03-29T04:36:39Z",
  "Last-Modified",
  "2019-03-29T04:36:39Z",
  "Last-Save-Date",
  "2019-03-29T04:36:39Z",
  "xmpDM:audioSampleRate",
  "1000",
  "meta:save-date",
  "2019-03-29T04:36:39Z",
  "modified",
  "2019-03-29T04:36:39Z",
  "tiff:ImageWidth",
  "1920",
  "xmpDM:duration",
  "2.64",
  "Content-Type",
  "video/mp4"],
"id":"mp4_4",
"attr_stream_size":["5721559"],
"attr_date":["2019-03-29T04:36:39Z"],
"attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
  "org.apache.tika.parser.mp4.MP4Parser"],
"attr_stream_content_type":["application/octet-stream"],
"attr_meta_creation_date":["2019-03-29T04:36:39Z"],
"attr_creation_date":["2019-03-29T04:36:39Z"],
"attr_tiff_imagelength":["1080"],

"resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
"attr_dcterms_created":["2019-03-29T04:36:39Z"],
"attr_dcterms_modified":["2019-03-29T04:36:39Z"],
"last_modified":"2019-03-29T04:36:39Z",
"attr_last_save_date":["2019-03-29T04:36:39Z"],
"attr_xmpdm_audiosamplerate":["1000"],
"attr_meta_save_date":["2019-03-29T04:36:39Z"],
"attr_modified":["2019-03-29T04:36:39Z"],
"attr_tiff_imagewidth":["1920"],
"attr_xmpdm_duration":["2.64"],
"content_type":["video/mp4"],
"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
"_version_":1632383499325407232}]
  }}

JPEG is getting these:
"attr_meta":[
"GPS Latitude",
  "37° 47' 41.99\"",

"attr_gps_latitude":["37° 47' 41.99\""],


On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:

> uploading video to solr via tika
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> The index has no video GPS metadata which is extracted and indexed for
> images such as jpeg. I have checked both MP4 and MOV files, the files I
> checked all have GPS Exif data embedded in the same fields as image. Any
> idea? Thanks!
>