Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Aroop Ganguly
Congrats Jan and Best Wishes!

Also many thanks to Anshum for all his efforts in the last term!
Also a belated shout out to Cassandra and other chairs in the past for their 
much appreciated efforts for the community ! 

Regards
Aroop

> On Feb 18, 2021, at 10:55 AM, Anshum Gupta  wrote:
> 
> Hi everyone,
> 
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
> 
> Congratulations Jan!
> 
> -- 
> Anshum Gupta



Re: UPDATE collection's Rule-based Replica Placement

2021-02-11 Thread Aroop Ganguly
Moshe

An indirect way to do this could be to take backup of this collection and then 
restore with the desired placement rules.

Backup: 
Example: curl “https://solr.foo.com/solr/admin/collections? 
action=backup&name=backup_name&collection=source_collection&repository=hdfsBackupRepository&async=b1
 

Ref: 
https://lucene.apache.org/solr/guide/8_8/collection-management.html#backup 



Restore with rules:

Example: curl  
“https://solr.foo.com/solr/admin/collections?action=RESTORE&name=backup_name&collection=targetcollection&repository=hdfsBackupRepository&async=restore-demo&replicationFactor=3&maxShardsPerNode=3&rule=shard:*,replica:
 

   
<2,node:*&rule=replica:*,cores:<3~&rule=sysprop.PLATFORM_RACK:*,replica:<3”

Ref: 
https://lucene.apache.org/solr/guide/8_8/collection-management.html#restore 



Hope this helps and gives u a way forward.

Thanks
Aroop


> On Feb 10, 2021, at 2:23 PM, Ilan Ginzburg  wrote:
> 
> Do you look for something that would move existing collection replicas
> to comply with a new set of rules?
> I'm afraid that doesn't exist, but you can use the Collection API to
> move replicas "manually".
> 
> Ilan
> 
> On Tue, Feb 9, 2021 at 1:10 PM mosheB  wrote:
>> 
>> Hi community,
>> Using Solr 8.3, is there any way to change the replica placment of "running"
>> collection say "from this point forward" or should I recreate the collection
>> and migrate all my data from the existing collection to the new one?
>> Tried to use the COLLECTIONPROP action which doesn't do the job, instead it
>> just update collectionprops.json file and not really affect the replica
>> placement enforcement.
>> 
>> Thanks!
>> 
>> 
>> 
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: BasicAuth help

2020-09-04 Thread Aroop Ganguly
Try looking at a simple ldap authentication suggested here: 
https://github.com/itzmestar/ldap_solr 
You can combine this for authentication and couple it with rule based 
authorization.



> On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT  > wrote:
> 
> Hello,
> 
> Solr is running on windows machine and wondering if it possible to setup 
> BasicAuth with the LDAP?
> 
> Also, tried the example of Basic-Authentication that is published 
> here  
> >
>  but this did not work too.
> 
> Thanks...Ali
> 
> 



Re: Solr with HDFS configuration example running in production/dev

2020-08-20 Thread Aroop Ganguly
HDFS will still be there, just NOT on the core package, but as a plug-in or 
contrib.

> On Aug 20, 2020, at 11:07 AM, Aroop Ganguly  wrote:
> 
> HDFS will still be there, just on the core package, but as a plug-in or 
> contrib.



Re: Solr with HDFS configuration example running in production/dev

2020-08-20 Thread Aroop Ganguly
HDFS will still be there, just on the core package, but as a plug-in or contrib.


> On Aug 19, 2020, at 7:50 AM, Prashant Jyoti  wrote:
> 
> You're right Andrew. Even I read about that. But there's a use case for
> which we want to configure the said case.
> 
> Are you also aware of what feature we are moving towards instead of HDFS?
> Will you be able to help me with the error that I'm running into?
> 
> Thanks in advance!
> 
> 
> On Wed, 19 Aug, 2020, 5:24 pm Andrew MacKay, 
> wrote:
> 
>> I believe HDFS support is being deprecated in Solr.  Not sure you want to
>> continue configuration if support will disappear.
>> 
>> On Wed, Aug 19, 2020 at 7:52 AM Prashant Jyoti 
>> wrote:
>> 
>>> Hi all,
>>> Hope you are healthy and safe.
>>> 
>>> Need some help with HDFS configuration.
>>> 
>>> Could anybody of you share an example of the configuration with which you
>>> are running Solr with HDFS in any of your production/dev environments?
>>> I am interested in the parts of SolrConfig.xml / Solr.in.cmd/sh which you
>>> may have modified. Obviously with the security parts obfuscated.
>>> 
>>> I am stuck at an error and unable to move ahead. Attaching the exception
>>> log if anyone is interested to look at the error.
>>> 
>>> Thanks!
>>> 
>>> --
>>> Regards,
>>> Prashant.
>>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE: The information contained in this email is
>> privileged and confidential and intended only for the use of the
>> individual
>> or entity to whom it is addressed.   If you receive this message in error,
>> please notify the sender immediately at 613-729-1100 and destroy the
>> original message and all copies. Thank you.
>> 



Re: SOLR indexing takes longer time

2020-08-17 Thread Aroop Ganguly
Adding on to what others have said, indexing speed in general is largely 
affected by the parallelism and isolation you can give to each node.
Is there a reason why you cannot have more than 1 shard?
If you have 5 node cluster, why not have 5 shards, maxshardspernode=1 replica=1 
is ok. You should see dramatic gains.
Solr’s power and speed in doing everything comes from using it as a distributed 
system. By sharing more you will be using the benefit of that distributed 
capability,

HTH

Regards
Aroop

> On Aug 17, 2020, at 11:22 AM, Abhijit Pawar  wrote:
> 
> Hello,
> 
> We are indexing some 200K plus documents in SOLR 5.4.1 with no shards /
> replicas and just single core.
> It takes almost 3.5 hours to index that data.
> I am using a data import handler to import data from the mongo database.
> 
> Is there something we can do to reduce the time taken to index?
> Will upgrade to newer version help?
> 
> Appreciate your help!
> 
> Regards,
> Abhijit



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
There may be other ways, easiest way is to write a script that gets the cluster 
status, and for each collection per replica you will have these details:

"collections":{
  “collection1":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{
  "shard1":{
"range":"8000-8ccb",
"state":"active",
"replicas":{"core_node33":{
"core”:"collection1_shard1_replica_n30",
"base_url":"http://host:port/solr";,
"node_name”:”host:port",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}},

For each replica of each shard make a localized call for numRecords:  
base_url/core/sleect?q=*:*&shard=shardX&distrib=false&rows=0
If you have replicas that disagree with each other with the number of records 
per shard then u have an issue with replicas not being in sync for a collection.
This is what I meant when I said “replicas out of sync”.


Your situation was actually very simple :) one of you collections has less data.
You seem to have a sync requirement between collections which is interesting, 
but thats beyond solr.
Your inter collection sync script needs some debugging most likely :) 




> On Aug 12, 2020, at 4:29 PM, Jae Joo  wrote:
> 
> Good question. How can I validate if the replicas are all synched?
> 
> 
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
> 
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>> 
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>> 
>>>> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>>>> 
>>>> The replications are all synched and there are no updates while I was
>>>> testing.
>>>> 
>>>> 
>>>> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>>>>  wrote:
>>>> 
>>>>> Most likely you have 1 or more collections behind the alias that have
>>>>> replicas out of sync :)
>>>>> 
>>>>> Try querying each collection to find the one out of sync.
>>>>> 
>>>>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>>>>> 
>>>>>> I have 10 collections in single alias and having different result sets
>>>>> for
>>>>>> every time with the same query.
>>>>>> 
>>>>>> Is it as designed or do I miss something?
>>>>>> 
>>>>>> The configuration and schema for all 10 collections are identical.
>>>>>> Thanks,
>>>>>> 
>>>>>> Jae
>>>>> 
>>>>> 
>>> 
>>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Glad u nailed the out of sync one :) 

> On Aug 12, 2020, at 4:38 PM, Jae Joo  wrote:
> 
> I found it the root cause. I have 3 collections assigned to a alias and one
> of them are NOT synched.
> By the alias.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 2
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 3
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Aug 12, 2020 at 7:29 PM Jae Joo  wrote:
> 
>> Good question. How can I validate if the replicas are all synched?
>> 
>> 
>> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
>> 
>>> numFound  is same but different score.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>>  wrote:
>>> 
>>>> Try a simple test of querying each collection 5 times in a row, if the
>>>> numFound are different for a single collection within tase 5 calls then u
>>>> have it.
>>>> Please try it, what you may think is sync’d may actually not be. How do
>>>> you validate correct sync ?
>>>> 
>>>>> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>>>>> 
>>>>> The replications are all synched and there are no updates while I was
>>>>> testing.
>>>>> 
>>>>> 
>>>>> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>>>>>  wrote:
>>>>> 
>>>>>> Most likely you have 1 or more collections behind the alias that have
>>>>>> replicas out of sync :)
>>>>>> 
>>>>>> Try querying each collection to find the one out of sync.
>>>>>> 
>>>>>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>>>>>> 
>>>>>>> I have 10 collections in single alias and having different result
>>>> sets
>>>>>> for
>>>>>>> every time with the same query.
>>>>>>> 
>>>>>>> Is it as designed or do I miss something?
>>>>>>> 
>>>>>>> The configuration and schema for all 10 collections are identical.
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jae
>>>>>> 
>>>>>> 
>>>> 
>>>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Try a simple test of querying each collection 5 times in a row, if the numFound 
are different for a single collection within tase 5 calls then u have it.
Please try it, what you may think is sync’d may actually not be. How do you 
validate correct sync ?

> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Most likely you have 1 or more collections behind the alias that have replicas 
out of sync :) 

Try querying each collection to find the one out of sync.

> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> 
> I have 10 collections in single alias and having different result sets for
> every time with the same query.
> 
> Is it as designed or do I miss something?
> 
> The configuration and schema for all 10 collections are identical.
> Thanks,
> 
> Jae



Re: Cannot add replica during backup

2020-08-10 Thread Aroop Ganguly
> We have 16 shards each approx 30GB - total is ~480GB. I'm also pretty sure
> it's a network issue. Very interesting that you can index 20x the data in
> 15 min!
Not index but backup an index in 15min.



>>> It would also help to ensure your overseer is on a node with a role that
> exempts it from any Solr index responsibilities.
> How would I ensure this? First I'm hearing about this!

Lookup roles and snitches and tags here: 
https://lucene.apache.org/solr/guide/7_7/rule-based-replica-placement.html 
<https://lucene.apache.org/solr/guide/7_7/rule-based-replica-placement.html>  



> On Aug 10, 2020, at 6:54 PM, Ashwin Ramesh  <mailto:ash...@canva.com.INVALID>> wrote:
> 
> Hi Aroop,
> 
> We have 16 shards each approx 30GB - total is ~480GB. I'm also pretty sure
> it's a network issue. Very interesting that you can index 20x the data in
> 15 min!
> 
>>> It would also help to ensure your overseer is on a node with a role that
> exempts it from any Solr index responsibilities.
> How would I ensure this? First I'm hearing about this!
> 
> Thanks for all the help!!
> 
> On Tue, Aug 11, 2020 at 11:48 AM Aroop Ganguly
> mailto:aroopgang...@icloud.com.invalid>> 
> wrote:
> 
>> Hi Ashwin
>> 
>> Thanks for sharing this detail.
>> Do you mind sharing how big are each of these indices ?
>> I am almost sure this is network capacity and constraints related per your
>> aws setup.
>> 
>> Yes if you can confirm that the backup is complete, or you just want the
>> system to move on discarding the backup process, your removal of the backup
>> flag from zookeeper will help Solr in moving on to the next task in the
>> queue.
>> 
>> It would also help to ensure your overseer is on a node with a role that
>> exempts it from any Solr index responsibilities.
>> 
>> 
>>> On Aug 10, 2020, at 6:43 PM, Ashwin Ramesh >> <mailto:ash...@canva.com.INVALID>>
>> wrote:
>>> 
>>> Hey Aroop, the general process for our backup is:
>>> - Connect all machines to an EFS drive (AWS's NFS service)
>>> - Call the collections API to backup into EFS
>>> - ZIP the directory once the backup is completed
>>> - Copy the ZIP into an s3 bucket
>>> 
>>> I'll probably have to see which part of the process is the slowest.
>>> 
>>> On another note, can you simply remove the task from the ZK path to
>>> continue the execution of tasks?
>>> 
>>> Regards,
>>> 
>>> Ash
>>> 
>>> On Tue, Aug 11, 2020 at 11:40 AM Aroop Ganguly
>>> mailto:aroopgang...@icloud.com.invalid>> 
>>> wrote:
>>> 
>>>> 12 hours is extreme, we take backups of 10TB worth of indexes in 15 mins
>>>> using the collection backup api.
>>>> How are you taking the backup?
>>>> 
>>>> Do you actually see any backup progress or u are just seeing the task in
>>>> the overseer queue linger ?
>>>> I have seen restore tasks hanging in the queue forever despite process
>>>> completing in Solr 77 so wouldn’t be surprised this happens with backup
>> as
>>>> well. And also observed that unless that unless that task is removed
>> from
>>>> the overseer-collection-queue the next ones do not proceed.
>>>> 
>>>> Also adding replicas while backup seems like overkill, why don’t you
>> just
>>>> have the appropriate replication factor in the first place and have
>>>> autoAddReplicas=true for indemnity?
>>>> 
>>>>> On Aug 10, 2020, at 6:32 PM, Ashwin Ramesh >>>> <mailto:ash...@canva.com.INVALID>>
>>>> wrote:
>>>>> 
>>>>> Hi everybody,
>>>>> 
>>>>> We are using solr 7.6 (SolrCloud). We notices that when the backup is
>>>>> running, we cannot add any replicas to the collection. By the looks of
>>>> it,
>>>>> the job to add the replica is put into the Overseer queue, but it is
>> not
>>>>> being processed. Is this expected? And are there any workarounds?
>>>>> 
>>>>> Our backups take about 12 hours. Maybe we should try optimize that too.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Ash
>>>>> 
>>>>> --
>>>>> **
>>>>> ** <https://www.canva.com/ <https://www.canva.com/>>Empowering the world 
>>>>> to design
>>>>> Share accurate
>&g

Re: Solr + Parquets

2020-08-10 Thread Aroop Ganguly


> script to iterate and load the files via the post command.
You mean load parquet filed over post? That sounds unbelievable …
Do u mean you created Solr doc for each parquet record in a partition and used 
solrJ or some other java lib to post the docs to Solr?

df.mapPatitions(p => { ///batch the parquet records, convert batch to a 
solr-doc-batch, then send to Solr via Solr request})


If you are sending raw parquet to Solr I would love to learn more :) !

> On Aug 10, 2020, at 7:50 PM, Russell Jurney  wrote:
> 
> There are ways to load data directly from Spark to Solr but I didn't find
> any of them satisfactory so I just create enough Spark partitions with
> reparition() (increase partition count)/coalesce() (decrease partition
> count) that I get as many Parquet files as I want and then I use a bash
> script to iterate and load the files via the post command.
> 
> Thanks,
> Russell Jurney @rjurney 
> russell.jur...@gmail.com LI  FB
>  datasyndrome.com
> 
> 
> On Fri, Aug 7, 2020 at 9:48 AM Jörn Franke  wrote:
> 
>> DIH is deprecated and it will be removed from Solr. You may though still
>> be able to install it as a plug-in. However, AFAIK nobody maintains it. Do
>> not use it anymore
>> 
>> You can write a custom Spark data source that writes to Solr or does it in
>> a spark Map step using SolrJ .
>> In both cases do not create 100s of executors to avoid overloading.
>> 
>> 
>>> Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout <
>> kevin.vanl...@gmail.com>:
>>> 
>>> Hi,
>>> 
>>> Is there any assistance around writing parquets from spark to solr shards
>>> or is it possible to customize a DIH to import a parquet to a solr shard.
>>> Let me know if this is possible, or the best work around for this. Much
>>> appreciated, thanks
>>> 
>>> 
>>> Kevin VL
>> 



Re: Cannot add replica during backup

2020-08-10 Thread Aroop Ganguly
Hi Ashwin

Thanks for sharing this detail.
Do you mind sharing how big are each of these indices ?
I am almost sure this is network capacity and constraints related per your aws 
setup.

Yes if you can confirm that the backup is complete, or you just want the system 
to move on discarding the backup process, your removal of the backup flag from 
zookeeper will help Solr in moving on to the next task in the queue.

It would also help to ensure your overseer is on a node with a role that 
exempts it from any Solr index responsibilities. 


> On Aug 10, 2020, at 6:43 PM, Ashwin Ramesh  wrote:
> 
> Hey Aroop, the general process for our backup is:
> - Connect all machines to an EFS drive (AWS's NFS service)
> - Call the collections API to backup into EFS
> - ZIP the directory once the backup is completed
> - Copy the ZIP into an s3 bucket
> 
> I'll probably have to see which part of the process is the slowest.
> 
> On another note, can you simply remove the task from the ZK path to
> continue the execution of tasks?
> 
> Regards,
> 
> Ash
> 
> On Tue, Aug 11, 2020 at 11:40 AM Aroop Ganguly
>  wrote:
> 
>> 12 hours is extreme, we take backups of 10TB worth of indexes in 15 mins
>> using the collection backup api.
>> How are you taking the backup?
>> 
>> Do you actually see any backup progress or u are just seeing the task in
>> the overseer queue linger ?
>> I have seen restore tasks hanging in the queue forever despite process
>> completing in Solr 77 so wouldn’t be surprised this happens with backup as
>> well. And also observed that unless that unless that task is removed from
>> the overseer-collection-queue the next ones do not proceed.
>> 
>> Also adding replicas while backup seems like overkill, why don’t you just
>> have the appropriate replication factor in the first place and have
>> autoAddReplicas=true for indemnity?
>> 
>>> On Aug 10, 2020, at 6:32 PM, Ashwin Ramesh 
>> wrote:
>>> 
>>> Hi everybody,
>>> 
>>> We are using solr 7.6 (SolrCloud). We notices that when the backup is
>>> running, we cannot add any replicas to the collection. By the looks of
>> it,
>>> the job to add the replica is put into the Overseer queue, but it is not
>>> being processed. Is this expected? And are there any workarounds?
>>> 
>>> Our backups take about 12 hours. Maybe we should try optimize that too.
>>> 
>>> Regards,
>>> 
>>> Ash
>>> 
>>> --
>>> **
>>> ** <https://www.canva.com/>Empowering the world to design
>>> Share accurate
>>> information on COVID-19 and spread messages of support to your community.
>>> 
>>> Here are some resources
>>> <
>> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates>
>> 
>>> that can help.
>>> <https://twitter.com/canva> <https://facebook.com/canva>
>>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
>>> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
>>> <https://instagram.com/canva>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> **
> ** <https://www.canva.com/>Empowering the world to design
> Share accurate 
> information on COVID-19 and spread messages of support to your community.
> 
> Here are some resources 
> <https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates>
>  
> that can help.
> <https://twitter.com/canva> <https://facebook.com/canva> 
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
> <https://instagram.com/canva>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Cannot add replica during backup

2020-08-10 Thread Aroop Ganguly
12 hours is extreme, we take backups of 10TB worth of indexes in 15 mins using 
the collection backup api.
How are you taking the backup?

Do you actually see any backup progress or u are just seeing the task in the 
overseer queue linger ?
I have seen restore tasks hanging in the queue forever despite process 
completing in Solr 77 so wouldn’t be surprised this happens with backup as 
well. And also observed that unless that unless that task is removed from the 
overseer-collection-queue the next ones do not proceed. 

Also adding replicas while backup seems like overkill, why don’t you just have 
the appropriate replication factor in the first place and have 
autoAddReplicas=true for indemnity?

> On Aug 10, 2020, at 6:32 PM, Ashwin Ramesh  wrote:
> 
> Hi everybody,
> 
> We are using solr 7.6 (SolrCloud). We notices that when the backup is
> running, we cannot add any replicas to the collection. By the looks of it,
> the job to add the replica is put into the Overseer queue, but it is not
> being processed. Is this expected? And are there any workarounds?
> 
> Our backups take about 12 hours. Maybe we should try optimize that too.
> 
> Regards,
> 
> Ash
> 
> -- 
> **
> ** Empowering the world to design
> Share accurate 
> information on COVID-19 and spread messages of support to your community.
> 
> Here are some resources 
> 
>  
> that can help.
>   
>    
>     
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Solr Backup/Restore

2020-07-21 Thread Aroop Ganguly
Restore will only create the same number of shards as the original collection 
had when you took the backup.
If you are on a cluster with enough resources, you can try split shards to the 
desired numbers later on?
Split Shards has a more efficient implementation in solr 8.x but if u have a 
mostly vacant sol 7_4 cluster you can consider splitting shards,
 then taking a backup, and then restoring from this new backup with relevant 
replica placement rules or at least with maxShardsPerNode=1 or close.
Hope this helps.

> On Jul 21, 2020, at 12:31 PM, Anshuman Singh  
> wrote:
> 
> Hi,
> 
> I'm using Solr-7.4.0 and I want to export 4TB of data from our current Solr
> cluster to a different cluster. The new cluster has twice the number of
> nodes than the current cluster and I want data to be distributed among all
> the nodes. Is this possible with the Backup/Restore feature considering the
> fact that I want to increase the number of shards in the new Collections?
> 
> From the official docs:
> *"Support for backups when running SolrCloud is provided with
> the Collections API
> .
> This allows the backups to be generated across multiple shards, and
> restored to the same number of shards and replicas as the original
> collection."*
> 
> I tried to create a backup of one collection using this feature but it is
> giving me this error described here
> https://issues.apache.org/jira/browse/SOLR-12523.
> 
> Can someone guide me on this and is there any other way to do this exercise
> which would take less time?
> 
> Thanks,
> Anshuman



Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Aroop Ganguly
Just to highlight the usage and importance of some of the items here.

1. HDFS Backup/Restore is integral to our system architecture, index 
distribution and Disaster Recovery (system used by 1000+ users internally)
2. HDFS Directory factory, Embedded Solr, these items too are very important 
for offline index generation at large scale (~5-10TB big source data)

We really use these items and they are very relevant for companies that run 
Solr to augment Big Data analysis.

I just wanted to mention this in case these features’ need and value to 
customers/users were not represented as yet.

I do support the cleansing of the core as long these items are still available 
via dedicated module/plug-ins.

Thanks for the discussion around this. I hope the PMC community will consider 
these things and guide accordingly.

Regards
Aroop

> On Jul 16, 2020, at 1:30 AM, Anshum Gupta  wrote:
> 
> Thanks for the feedback, Colvin. We'll certainly try and do something
> around making the deprecations more visible and easier to track for all
> users. I'm not sure if 'news' is the right section, but I think it might be
> good to have a section on the website for users to look at and get a better
> idea about.
> 
> The PMC and committers are certainly aware of the importance of not
> dropping desirable features but let's not hijack the release announcement
> thread for this discussion.
> 
> On Thu, Jul 16, 2020 at 1:09 AM Colvin Cowie 
> wrote:
> 
>> Perhaps the deprecation notices should feature on
>> https://lucene.apache.org/solr/news.html ? Because right now, they're not
>> *very
>> *visible in the changes.
>> 
>> On Thu, 16 Jul 2020 at 01:18, Aroop Ganguly > .invalid>
>> wrote:
>> 
>>> May we ask what in hdfs support is being deprecated? Is Hdfs backup and
>>> restore being deprecated ?
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jul 15, 2020, at 3:41 PM, Houston Putman 
>>> wrote:
>>>> 
>>>> To address your concern Bernd,
>>>> 
>>>> The goal of this deprecation is not to remove the functionality
>> entirely.
>>>> The primary purpose is to remove the code from Solr Core. Before
>>> removing a
>>>> feature we aim to either:
>>>> 
>>>>  - Move the code to another repository, and have it be loadable via a
>>>>  plugin
>>>>  - Replace the feature with something more stable and/or scalable.
>>>>  (Likely loadable via a plugin or run in a separate application)
>>>> 
>>>> I understand your frustration, but the ultimate goal here is to make
>> Solr
>>>> more customizable and plugable (and therefore learner by default). This
>>> way
>>>> the base Solr functionality can be as bug-free and performant as
>>> possible,
>>>> and any extra features can be added on top as needed.
>>>> 
>>>> We would appreciate feedback for how the community would prefer these
>>>> features be provided in the future, so that we make the transition
>>> smoother
>>>> and the end product better.
>>>> 
>>>> - Houston
>>>> 
>>>>> On Wed, Jul 15, 2020 at 5:51 PM Ishan Chattopadhyaya <
>>>>> ichattopadhy...@gmail.com> wrote:
>>>>> 
>>>>> On Wed, 15 Jul, 2020, 8:37 pm Bernd Fehling, <
>>>>> bernd.fehl...@uni-bielefeld.de>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Am 15.07.20 um 16:07 schrieb Ishan Chattopadhyaya:
>>>>>>> Dear Solr Users,
>>>>>>> 
>>>>>>> In this release (Solr 8.6), we have deprecated the following:
>>>>>>> 
>>>>>>> 1. Data Import Handler
>>>>>>> 
>>>>>>> 2. HDFS support
>>>>>>> 
>>>>>>> 3. Cross Data Center Replication (CDCR)
>>>>>>> 
>>>>>> 
>>>>>> Seriously? :-(
>>>>>> 
>>>>> 
>>>>> Please see SOLR-14022.
>>>>> 
>>>>> 
>>>>>> So next steps will be kicking out Cloud and go back to single node or
>>>>> what?
>>>>>> 
>>>>> 
>>>>> Not something we've considered yet.
>>>>> 
>>>>> 
>>>>>> Why don't you just freeze the whole Solr development and switch to
>>>>> Elast

Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-15 Thread Aroop Ganguly
May we ask what in hdfs support is being deprecated? Is Hdfs backup and restore 
being deprecated ? 

Sent from my iPhone

> On Jul 15, 2020, at 3:41 PM, Houston Putman  wrote:
> 
> To address your concern Bernd,
> 
> The goal of this deprecation is not to remove the functionality entirely.
> The primary purpose is to remove the code from Solr Core. Before removing a
> feature we aim to either:
> 
>   - Move the code to another repository, and have it be loadable via a
>   plugin
>   - Replace the feature with something more stable and/or scalable.
>   (Likely loadable via a plugin or run in a separate application)
> 
> I understand your frustration, but the ultimate goal here is to make Solr
> more customizable and plugable (and therefore learner by default). This way
> the base Solr functionality can be as bug-free and performant as possible,
> and any extra features can be added on top as needed.
> 
> We would appreciate feedback for how the community would prefer these
> features be provided in the future, so that we make the transition smoother
> and the end product better.
> 
> - Houston
> 
>> On Wed, Jul 15, 2020 at 5:51 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>> 
>> On Wed, 15 Jul, 2020, 8:37 pm Bernd Fehling, <
>> bernd.fehl...@uni-bielefeld.de>
>> wrote:
>> 
>>> 
>>> 
>>> Am 15.07.20 um 16:07 schrieb Ishan Chattopadhyaya:
 Dear Solr Users,
 
 In this release (Solr 8.6), we have deprecated the following:
 
  1. Data Import Handler
 
  2. HDFS support
 
  3. Cross Data Center Replication (CDCR)
 
>>> 
>>> Seriously? :-(
>>> 
>> 
>> Please see SOLR-14022.
>> 
>> 
>>> So next steps will be kicking out Cloud and go back to single node or
>> what?
>>> 
>> 
>> Not something we've considered yet.
>> 
>> 
>>> Why don't you just freeze the whole Solr development and switch to
>> Elastic?
>>> 
>> 
>> Not something we've considered yet.
>> 
>> 
>>> 
 
 
 All of these are scheduled to be removed in a future 9.x release.
 
 It was decided that these components did not meet the standards of
>>> quality
 and support that we wish to ensure for all components we ship. Some of
 these also relied on design patterns that we no longer recommend for
>> use
>>> in
 critical production environments.
 
 If you rely on these features, you are encouraged to try out community
 supported versions of these, where available [0]. Where such community
 support is not available, we encourage you to participate in the
>>> migration
 of these components into community supported packages and help continue
>>> the
 development. We envision that using packages for these components via
 package manager will actually make it easier for users to use such
>>> features.
 
 Regards,
 
 Ishan Chattopadhyaya
 
 (On behalf of the Apache Lucene/Solr PMC)
 
 [0] -
 
>>> 
>> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
 
 On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
>> bruno.roust...@gmail.com
 
 wrote:
 
> The Lucene PMC is pleased to announce the release of Apache Solr
>> 8.6.0.
> 
> 
> Solr is the popular, blazing fast, open source NoSQL search platform
>>> from
> the Apache Lucene project. Its major features include powerful
>> full-text
> search, hit highlighting, faceted search, dynamic clustering, database
> integration, rich document handling, and geospatial search. Solr is
>>> highly
> scalable, providing fault tolerant distributed search and indexing,
>> and
> powers the search and navigation features of many of the world's
>> largest
> internet sites.
> 
> 
> Solr 8.6.0 is available for immediate download at:
> 
> 
>  
> 
> 
> ### Solr 8.6.0 Release Highlights:
> 
> 
> * Cross-Collection Join Queries: Join queries can now work
> cross-collection, even when shared or when spanning nodes.
> 
> * Search: Performance improvement for some types of queries when
>> exact
> hit count isn't needed by using BlockMax WAND algorithm.
> 
> * Streaming Expression: Percentiles and standard deviation
>> aggregations
> added to stats, facet and time series.  Streaming expressions added to
> /export handler.  Drill Streaming Expression for efficient and
>> accurate
> high cardinality aggregation.
> 
> * Package manager: Support for cluster (CoreContainer) level plugins.
> 
> * Health Check: HealthCheckHandler can now require that all cores are
> healthy before returning OK.
> 
> * Zookeeper read API: A read API at /api/cluster/zk/* to fetch raw ZK
> data and view contents of a ZK directory.
> 
> * Admin UI: New panel with security info in admin UI's dashboard.
> 
> * Query DSL: Support for {param:ref} and {bool: {excl

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-14 Thread Aroop Ganguly
Isabele sometime 401’s are a red herring for other issues un related to auth.
We have had issues on 7.7 where an underlying transient replica recovery and/or 
leader down situation where the only message we got back from Solr was a 401.
Please see if u have any down replicas or other issues where certain nodes may 
have trouble getting more current information from zookeeper.


> On Jun 14, 2020, at 2:13 PM, Isabelle Giguere  > wrote:
> 
> I have created https://issues.apache.org/jira/browse/SOLR-14569 
> 
> It includes a patch with the unit test to reproduce the issue, and a 
> simplification of our product-specific configuration, with instructions.
> 
> Let's catch up on Jira.
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl mailto:jan@cominvent.com>>
> Envoyé : 13 juin 2020 17:50
> À : solr-user  >
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> I did not manage to reproduce. Feel free to open the JIRA and attach the 
> failing test. In the issue description, it is great if you manage to describe 
> the reproduction steps in a clean way, so anyone can reproduce with a minimal 
> neccessary config.
> 
> Jan
> 
>> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
>> mailto:igigu...@opentext.com.INVALID>>:
>> 
>> Hello again;
>> 
>> I have managed to reproduce the issue in a unit test.  I should probably add 
>> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>> 
>> Meanwhile, for your suggested queries:
>> 
>> 1.  Query on the collection:
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> 
>> HTTP/1.1 200 OK
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 8214
>> 
>> 
>> 
>> 
>> 
>> true
>> 0
>> 2
>> 
>>   *:*
>> 
>> 
>> 
>> Response contains the Solr document, of course
>> 
>> 
>> 2. Query on the alias
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>>  
>> >  
>> 
>>  >
>> HTTP/1.1 401 Unauthorized
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Cache-Control: no-cache, no-store
>> Pragma: no-cache
>> Expires: Sat, 01 Jan 2000 01:00:00 GMT
>> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
>> ETag: "172aaa7c1eb"
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 1332
>> 
>> 
>> 
>> 
>> 
>> true
>> 401
>> 16
>> 
>>   *:*
>> 
>> 
>> 
>> Error contains the full html HTTP 401 message (with escaped characters, of 
>> course)
>> Gist of it : HTTP ERROR 401 require authentication
>> 
>> Thanks;
>> 
>> 
>> Isabelle Giguère
>> Computational Linguist & Java Developer
>> Linguiste informaticienne & développeur java
>> 
>> 
>> 
>> De : Jan Høydahl mailto:jan@cominvent.com>>
>> Envoyé : 12 juin 2020 17:30
>> À : solr-user@lucene.apache.org  
>> mailto:solr-user@lucene.apache.org>>
>> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>> 
>> I’d say, try the query with curl and enable http headers
>> 
>> curl -i —user admin:admin 
>> http://localhost:8983/solr/mycollection/select?q=*:* 
>> 

Re: Solr Streaming Expression failures

2020-03-26 Thread Aroop Ganguly
I have personally not used streaming expressions to commit data to a collection 
(have used them a lot of querying), and would not recommend it for bulk 
indexing unless Joel recommends it :) 

On the other hand we have had decent success in indexing at scale and 12 
million is not a big number.
You would need to have a decently sized cluster and have a commensurate number 
of shards. Indexing speed has correlation to number of shards, inverse 
correlation to number of replicas and maxShardsPerNode.
You can use traditional solrj apis to commit in parallel usage multiple threads 
concurrently.


> On Mar 26, 2020, at 2:59 PM, Mohamed Sirajudeen Mayitti Ahamed Pillai 
>  wrote:
> 
> Hi Everyone,
> 
> We are using Solr 7.4 with 3 external ZKs and 7 Solr node in a cloud setup. 
> We are using Streaming expression to pull 12million records from a different 
> Solr Cloud using below expression.
> 
> http://solrTarget:8983/solr/collection1/stream?expr=commit(collection1,batchSize=1,update(collection1,batchSize=1,search(collection1,zkHost="zkhost_source:9983",sort="timestamp_tdt
>  asc, id asc", rows=12114606, q=" aggr_type_s:click@doc_id,filters* AND 
> timestamp_tdt:[2020-03-25T18:58:33.337Z TO 2020-03-26T18:58:33.336Z]", 
> fl="id,timestamp_tdt,*",TZ="CST"))).
> 
> Collection 1 in SolrTarget has 2 shards and 2 replicas. Collection 1 in 
> solrSource has 1 shard and 2 replicas
> 
> The streaming expression reads documents from collection1 in 
> zkhost_source:9983 and indexes into collection1 in solrTarget environment.
> Similar streaming expression with less number of documents (less than 
> 5million) working without any failures.
> This streaming expression is not been successful as it grow bigger and 
> bigger, as we have been noticing that streaming expression is getting failed 
> response with different kind of errors.
> 
> Few error messages are below,
> 
> 
>  1.  Error trying to proxy request for url: http:// 
> solrTarget:8983/solr/collection1/stream, metadata=[error-class, 
> org.apache.solr.common.SolrException, root-error-class, 
> java.net.SocketTimeoutException], trace=org.apache.solr.common.SolrException: 
> Error trying to proxy request for url: http:// 
> solrTarget:8983/solr/collection1/stream
>  2.  {result-set={docs=[{EXCEPTION=java.util.concurrent.ExecutionException: 
> java.io.IOException: params 
> sort=timestamp_tdt+asc,+id+asc&rows=12114606&q=aggr_type_s:click@doc_id,filters*+AND+timestamp_tdt:[2020-03-25T18:58:33.337Z+TO+2020-03-26T18:58:33.336Z]&fl=id,timestamp_tdt,*&TZ=CST&distrib=false,
>  RESPONSE_TIME=121125, EOF=true}]}}
>  3.  {result-set={docs=[{EXCEPTION=org.apache.solr.common.SolrException: 
> Could not load collection from ZK: collection10, RESPONSE_TIME=139300, 
> EOF=true}]}}
> 
> 
> Is it a known issue with Streaming expression when it comes to bulk indexing 
> using update and commit expression? Is there any work-around to this issue?
> 
> Is there a better option available in Solr to index 12million records (with 
> only 12 fields per document) at a faster speed?
> 
> Thanks,
> Mohamed



Re: How do *you* restrict access to Solr?

2020-03-16 Thread Aroop Ganguly
Hi Ryan

You should consider a simple rule based authorization scheme.
Your staff user can be given readonly privileges to everything you want to 
except the admin ui.

Depending on which version of solr you are on this can be trivial.

- Aroop

> On Mar 16, 2020, at 8:46 AM, Ryan W  wrote:
> 
> On Mon, Mar 16, 2020 at 10:51 AM Susheel Kumar 
> wrote:
> 
>> Basic auth should help you to start
>> 
>> https://lucene.apache.org/solr/guide/8_1/basic-authentication-plugin.html
> 
> 
> 
> Thanks.  I think I will give up on the plugin system.  I haven't been able
> to get the plugin system to work, and it creates too many opportunities for
> human error.  Even if I can get it working this week, what about 6 months
> from now or a year from now when something goes wrong and I have to debug
> it.  It seems like far too much overhead to provide the desired security
> benefit, except perhaps in situations where an organization has Solr
> specialists who can maintain the system.



Re: Overseer & Backups - Questions

2020-03-10 Thread Aroop Ganguly
Backups on hdfs ?
These should not be blocking if invoked asynchronously, are u doing them async 
by passing the async flag?

> On Mar 10, 2020, at 3:19 PM, Ashwin Ramesh  wrote:
> 
> We use the collection API to invoke backups. The tasks we noticed that
> stalled are ADDREPLICA. As expected when the backup completed a few hours
> ago, the task then got completed. Is there some concurrency setting with
> these tasks? Or is a backup a blocking task? We noticed that the index was
> still being flushed to segments though.
> 
> Regards,
> 
> Ash
> 
> On Wed, Mar 11, 2020 at 3:18 AM Aroop Ganguly
>  wrote:
> 
>> May we know how you are invoking backups ?
>> 
>>> On Mar 9, 2020, at 11:53 PM, Ashwin Ramesh 
>> wrote:
>>> 
>>> Hi everybody,
>>> 
>>> Quick Specs:
>>> - Solr 7.4 Solr Cloud
>>> - 30gb index on 8 shards Tlog/Pull
>>> 
>>> We run daily backups on our 30gb index and noticed that the overseer does
>>> not process other jobs on it's task list while the backup is being taken.
>>> They remain on the pending list (in ZK). Is this expected?
>>> 
>>> Also I was wondering if there was a safe way to cancel a currently
>> running
>>> task or deleing pending tasks?
>>> 
>>> Regards,
>>> 
>>> Ash
>>> 
>>> --
>>> **
>>> ** <https://www.canva.com/>Empowering the world to design
>>> Also, we're
>>> hiring. Apply here! <https://about.canva.com/careers/>
>>> 
>>> <https://twitter.com/canva> <https://facebook.com/canva>
>>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
>>> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
>>> <https://instagram.com/canva>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're 
> hiring. Apply here! <https://about.canva.com/careers/>
> 
> <https://twitter.com/canva> <https://facebook.com/canva> 
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
> <https://instagram.com/canva>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Overseer & Backups - Questions

2020-03-10 Thread Aroop Ganguly
May we know how you are invoking backups ?

> On Mar 9, 2020, at 11:53 PM, Ashwin Ramesh  wrote:
> 
> Hi everybody,
> 
> Quick Specs:
> - Solr 7.4 Solr Cloud
> - 30gb index on 8 shards Tlog/Pull
> 
> We run daily backups on our 30gb index and noticed that the overseer does
> not process other jobs on it's task list while the backup is being taken.
> They remain on the pending list (in ZK). Is this expected?
> 
> Also I was wondering if there was a safe way to cancel a currently running
> task or deleing pending tasks?
> 
> Regards,
> 
> Ash
> 
> -- 
> **
> ** Empowering the world to design
> Also, we're 
> hiring. Apply here! 
> 
>   
>    
>     
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Backups with only 1 machine having access to remote storage?

2020-02-20 Thread Aroop Ganguly
Hi Koen

Which backup mechanism are you using ?
HDFS backup setup is a lot more sophisticated, and backup repository settings 
made in the solr.xml manage lots of these things.
The node from where you issue the command would not have any bearing on the 
target collections’s data that you are trying to backup.
Backup will reach the designated destination, with all the data from your 
collection.

Thats why knowing your setup and settings for backup would help in advising you 
better.

Thanks
Aroop

> On Feb 20, 2020, at 8:25 AM, Koen De Groote  
> wrote:
> 
> Hello all,
> 
> I've recently set up backups, using solr 7.6
> 
> My setup has 3 replicas per collection and several collections. Not all
> collections or replicas are present on all hosts.
> 
> That being said, I run the backup command from 1 particular host and only
> that host has access to the mount on which the backup data will be written.
> 
> This means that the host writing the backup data doesn't have all the data
> on its local filesystem.
> 
> Is this a problem?
> 
> By which I mean: will data not present on that host be retrieved over the
> network?
> 
> What happens in this case?
> 
> Kind regards,
> Koen De Groote



Restore from HDFS slow

2019-09-05 Thread Aroop Ganguly
Hey Solr Experts

Anyone has idea how to ensure restore collections from HDFS can be made faster?
Are there any tuning parameters like number of threads to use, the memory to 
use etc, that can be configured someplace to enhance/manage  the restore 
process?

I am on Solr 7.7.2 btw and the api we use is: 
https://lucene.apache.org/solr/guide/7_6/making-and-restoring-backups.html#restore-api
 


Thanks
Aroop




Re: alias read access impossible for anyone other than admin?

2019-05-30 Thread Aroop Ganguly
Thanks Jason. 
We are awaiting the 7.7.2 release. 

I will send out a note describing how the documentation is easy to mess-up.
Maybe this is worth writing a blog for folks like yourselves who are experts in 
this :) 


> On May 28, 2019, at 4:31 AM, Jason Gerlowski  wrote:
> 
> Hey Aroop,
> 
> The fix in SOLR-13355 is available starting in 8.1.  It will also be
> available in 7.7.2 once that is released.  (Jan Hoydahl started the
> release process for 7.7.2, but held off for a number of other ongoing
> releases.  He's recently resumed work on the release though, and I
> expect we'll see 7.7.2 in a week or two.)
> 
> RuleBasedAuthorizationPlugin does have some coverage in the ref-guide,
> as you've likely seen:
> https://lucene.apache.org/solr/guide/7_7/rule-based-authorization-plugin.html.
> I don't think SOLR-13355 involved any changes to that documentation:
> it fixed a bug that deviated from what was described in the ref-guide,
> so there were no changes required when that bug was fixed.  That said,
> if you see something I've missed, or think that page could be improved
> more generally, it's definitely worth raising a JIRA for.  RBAP
> permission matching/processing can be subtle for those using it for
> the first time, so any improvement to the docs will go a long way.
> 
> Jason
> 
> On Sat, May 25, 2019 at 3:12 AM Aroop Ganguly  wrote:
>> 
>> hi jason
>> 
>> which version of solr has the definitive fix for the rbap again ?
>> also is there a jira to fix or create a documentation for the same that 
>> works :) ?
>> 
>> aroop
>> 
>> 
>>> On May 24, 2019, at 9:55 AM, Jason Gerlowski  wrote:
>>> 
>>> Hi Sotiris,
>>> 
>>> First, what version of Solr are you running?  We've made some fixes
>>> recently (esp. SOLR-13355) to RBAP, and they might affect the behavior
>>> you're seeing or any fixes we can recommend.
>>> 
>>> Second, the order of permissions in security.json has a huge effect on
>>> how .  Solr always uses the first permission rule that matches a given
>>> API...later rules are ignored if a match is found in earlier ones.
>>> The first rule in your permissions block ({"name": "all", "role":
>>> "admin"}) will match all APIs and will only allow requests through if
>>> the requesting user has the "admin" role.  So "user" being unable to
>>> query an alias makes sense.  Usually "all" and other catchall
>>> permissions are best used at the very bottom of your permissions list.
>>> That way the catchall is the last rule to be checked, giving other
>>> rules a chance to match first.
>>> 
>>> Hope that helps.
>>> 
>>> Jason
>>> 
>>> On Wed, May 22, 2019 at 6:21 AM Sotiris Fragkiskos  
>>> wrote:
>>>> 
>>>> Hi everyone!
>>>> I've been trying unsuccessfully to read an alias to a collection with a
>>>> curl command.
>>>> The command only works when I put in the admin credentials, although the
>>>> user I want access for also has the required role for accessing.
>>>> Is this perhaps built-in, or should anyone be able to access an alias from
>>>> the API?
>>>> 
>>>> The command I'm using is:
>>>> curl http://
>>>> :@/solr//select?q=:
>>>> This fails for the user but succeeds for the admin
>>>> 
>>>> My minimum working example of security.json follows.
>>>> Many thanks!
>>>> 
>>>> {
>>>> "authentication":{
>>>>   "blockUnknown":true,
>>>>   "class":"solr.BasicAuthPlugin",
>>>>   "credentials":{
>>>> "admin":"blahblahblah",
>>>> "user":"blahblah"},
>>>>   "":{"v":13}},
>>>> "authorization":{
>>>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>>>   "permissions":[
>>>> {
>>>>   "name":"all",
>>>>   "role":"admin",
>>>>   "index":1},
>>>> {
>>>>   "name":"readColl",
>>>>   "collection":"Coll",
>>>>   "path":"/select/*",
>>>>   "role":"readColl",
>>>>   "index":2},
>>>> {
>>>>   "name":"readSCollAlias",
>>>>   "collection":"sCollAlias",
>>>>   "path":"/select/*",
>>>>   "role":"readSCollAlias",
>>>>   "index":3}],
>>>>   "user-role":{
>>>> "admin":[
>>>>   "admin",
>>>>   "readSCollAlias"],
>>>> "user":["readSCollAlias"]},
>>>>   "":{"v":21}}}
>> 



Re: alias read access impossible for anyone other than admin?

2019-05-25 Thread Aroop Ganguly
hi jason

which version of solr has the definitive fix for the rbap again ?
also is there a jira to fix or create a documentation for the same that works 
:) ?

aroop


> On May 24, 2019, at 9:55 AM, Jason Gerlowski  wrote:
> 
> Hi Sotiris,
> 
> First, what version of Solr are you running?  We've made some fixes
> recently (esp. SOLR-13355) to RBAP, and they might affect the behavior
> you're seeing or any fixes we can recommend.
> 
> Second, the order of permissions in security.json has a huge effect on
> how .  Solr always uses the first permission rule that matches a given
> API...later rules are ignored if a match is found in earlier ones.
> The first rule in your permissions block ({"name": "all", "role":
> "admin"}) will match all APIs and will only allow requests through if
> the requesting user has the "admin" role.  So "user" being unable to
> query an alias makes sense.  Usually "all" and other catchall
> permissions are best used at the very bottom of your permissions list.
> That way the catchall is the last rule to be checked, giving other
> rules a chance to match first.
> 
> Hope that helps.
> 
> Jason
> 
> On Wed, May 22, 2019 at 6:21 AM Sotiris Fragkiskos  wrote:
>> 
>> Hi everyone!
>> I've been trying unsuccessfully to read an alias to a collection with a
>> curl command.
>> The command only works when I put in the admin credentials, although the
>> user I want access for also has the required role for accessing.
>> Is this perhaps built-in, or should anyone be able to access an alias from
>> the API?
>> 
>> The command I'm using is:
>> curl http://
>> :@/solr//select?q=:
>> This fails for the user but succeeds for the admin
>> 
>> My minimum working example of security.json follows.
>> Many thanks!
>> 
>> {
>>  "authentication":{
>>"blockUnknown":true,
>>"class":"solr.BasicAuthPlugin",
>>"credentials":{
>>  "admin":"blahblahblah",
>>  "user":"blahblah"},
>>"":{"v":13}},
>>  "authorization":{
>>"class":"solr.RuleBasedAuthorizationPlugin",
>>"permissions":[
>>  {
>>"name":"all",
>>"role":"admin",
>>"index":1},
>>  {
>>"name":"readColl",
>>"collection":"Coll",
>>"path":"/select/*",
>>"role":"readColl",
>>"index":2},
>>  {
>>"name":"readSCollAlias",
>>"collection":"sCollAlias",
>>"path":"/select/*",
>>"role":"readSCollAlias",
>>"index":3}],
>>"user-role":{
>>  "admin":[
>>"admin",
>>"readSCollAlias"],
>>  "user":["readSCollAlias"]},
>>"":{"v":21}}}



collection exists but delete by query fails

2019-05-08 Thread Aroop Ganguly


Hi 

I am on Solr 7.5 and I am issuing a delete-by-query using CloudSolrClient
The collection exists but issuing a deletebyquery is failing every single time.
I am wondering what is happening, and how to debug this.

org.apache.solr.client.solrj.SolrServerException: 
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:995)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:816)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:486)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1012)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883)
... 6 more


Re: Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-02 Thread Aroop Ganguly
Thats an interesting scaling scheme you mention.
I have been trying to devise a good scheme for myself for our scale.

I will try to see how this works out for us.

> On Apr 2, 2019, at 9:15 PM, Walter Underwood  wrote:
> 
> Yeah, that would overload it. To get good indexing speed, I configure two 
> clients per CPU on the indexing machine. With one shard on a 16 processor 
> machine, that would be 32 threads. With four shards on four 16 processor 
> machines, 128 clients. Basically, one thread is waiting while the CPU 
> processes a batch and the other is sending the next batch.
> 
> That should get the cluster to about 80% CPU. If the cluster is handling 
> queries at the same time, I cut that way back, like one client thread for 
> every two CPUs.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 2, 2019, at 8:13 PM, Aroop Ganguly  wrote:
>> 
>> Mutliple threads to the same index ? And how many concurrent threads?
>> 
>> Our case is not merely multiple threads but actually large scale spark 
>> indexer jobs that index 1B records at a time with a concurrency of 400.
>> In this case multiple such jobs were indexing into the same index. 
>> 
>> 
>>> On Apr 2, 2019, at 7:25 AM, Walter Underwood  wrote:
>>> 
>>> We run multiple threads indexing to Solr all the time and have been doing 
>>> so for years.
>>> 
>>> How big are your documents and how big are your batches?
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>> On Apr 1, 2019, at 10:51 PM, Aroop Ganguly  wrote:
>>>> 
>>>> Turns out the cause was multiple indexing jobs indexing into the index 
>>>> simultaneously, which one can imagine can cause jvm loads on certain 
>>>> replicas for sure.
>>>> Once this was found and only one job ran at a time, things were back to 
>>>> normal.
>>>> 
>>>> Your comments seem right on no correlation to the stack trace! 
>>>> 
>>>>> On Apr 1, 2019, at 5:32 PM, Shawn Heisey  wrote:
>>>>> 
>>>>> 4/1/2019 5:40 PM, Aroop Ganguly wrote:
>>>>>> Thanks Shawn, for the initial response.
>>>>>> Digging into a bit, I was wondering if we’d care to read the inner most 
>>>>>> stack.
>>>>>> From the inner most stack it seems to be telling us something about what 
>>>>>> trigger it ?
>>>>>> Ofcourse, the system could have been overloaded as well, but is the 
>>>>>> exception telling us something or its of no use to consider this stack
>>>>> 
>>>>> The stacktrace on OOME is rarely useful.  The memory allocation where the 
>>>>> error is thrown probably has absolutely no connection to the part of the 
>>>>> program where major amounts of memory are being used.  It could be ANY 
>>>>> memory allocation that actually causes the error.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>> 
>>> 
>> 
> 



Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Aroop Ganguly
Indexing speeds are function of a lot of variables in my experience.

What is your setup like? 
What kind of cluster you have, the number of shards you created, the number of 
machines etc?
Where is your input data coming from? What technology do you use to indexing 
(simple java threads or something more robust like flink/spark)?
How many documents do you index at a time?

How many times have u run the indexer job on the new 8.0 setup before 
concluding its slower?
Make a matrix of all these variables and test over at least 5 runs before 
making an opinion.

I’d love hear more 

> On Apr 2, 2019, at 7:41 PM, Zheng Lin Edwin Yeo  wrote:
> 
> For additional info, I am still using the same version of the major
> components like ZooKeeper, Tika, Carrot2 and Jetty.
> 
> Regards,
> Edwin
> 
> On Wed, 3 Apr 2019 at 10:17, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi,
>> 
>> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
>> scratch in Solr 8.0.0
>> 
>> However, I found that the indexing speed is slower in Solr 8.0.0, as
>> compared to the earlier version like Solr 7.7.1. I have not changed the
>> schema.xml and solrconfig.xml yet, just did a change of the
>> luceneMatchVersion in solrconfig.xml to 8.0.0
>> uceneMatchVersion>8.0.0
>> 
>> On average, the speed is about 40% to 50% slower. For example, the
>> indexing speed was about 17 mins in Solr 7.7.1, but now it takes about 25
>> mins to index the same set of data.
>> 
>> What could be the reason that causes the indexing to be slower in Solr
>> 8.0.0?
>> 
>> Regards,
>> Edwin
>> 



Re: Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-02 Thread Aroop Ganguly
Mutliple threads to the same index ? And how many concurrent threads?

Our case is not merely multiple threads but actually large scale spark indexer 
jobs that index 1B records at a time with a concurrency of 400.
In this case multiple such jobs were indexing into the same index. 


> On Apr 2, 2019, at 7:25 AM, Walter Underwood  wrote:
> 
> We run multiple threads indexing to Solr all the time and have been doing so 
> for years.
> 
> How big are your documents and how big are your batches?
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 1, 2019, at 10:51 PM, Aroop Ganguly  wrote:
>> 
>> Turns out the cause was multiple indexing jobs indexing into the index 
>> simultaneously, which one can imagine can cause jvm loads on certain 
>> replicas for sure.
>> Once this was found and only one job ran at a time, things were back to 
>> normal.
>> 
>> Your comments seem right on no correlation to the stack trace! 
>> 
>>> On Apr 1, 2019, at 5:32 PM, Shawn Heisey  wrote:
>>> 
>>> 4/1/2019 5:40 PM, Aroop Ganguly wrote:
>>>> Thanks Shawn, for the initial response.
>>>> Digging into a bit, I was wondering if we’d care to read the inner most 
>>>> stack.
>>>> From the inner most stack it seems to be telling us something about what 
>>>> trigger it ?
>>>> Ofcourse, the system could have been overloaded as well, but is the 
>>>> exception telling us something or its of no use to consider this stack
>>> 
>>> The stacktrace on OOME is rarely useful.  The memory allocation where the 
>>> error is thrown probably has absolutely no connection to the part of the 
>>> program where major amounts of memory are being used.  It could be ANY 
>>> memory allocation that actually causes the error.
>>> 
>>> Thanks,
>>> Shawn
>> 
> 



Re: IndexWriter has closed

2019-04-01 Thread Aroop Ganguly
Hi Edwin

Yes, we did not seem to have hit any filesystem upper-bounds.
I have not been able to reproduce this since this date.

> On Apr 1, 2019, at 7:28 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Have you check if there are enough space to index all the documents on your
> disk?
> 
> Regards,
> Edwin
> 
> On Fri, 29 Mar 2019 at 15:16, Aroop Ganguly  wrote:
> 
>> Trying again .. Any idea why this might happen?
>> 
>> 
>>> On Mar 27, 2019, at 10:43 PM, Aroop Ganguly 
>> wrote:
>>> 
>>> Hi Everyone
>>> 
>>> My indexing jobs are failing with “this IndexWriter has closed” errors..
>>> This is a solr 7.5 setup, with an NRT index.
>>> 
>>> In deeper logs I see, some of these exceptions,
>>> Any idea what could have caused this ?
>>> 
>>> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException:
>> java.io.IOException: Input/output error
>>>  at
>> org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:477)
>>>  at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:833)
>>>  at org.apache.solr.update.UpdateLog.preCommit(UpdateLog.java:817)
>>>  at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:669)
>>>  at
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:93)
>>>  at
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
>>>  at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1959)
>>>  at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1935)
>>>  at
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
>>>  at
>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>>>  at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
>>>  at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
>>>  at
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>>>  at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>>>  at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>>>  at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>>>  at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>>  at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>>>  at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>>>  at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>>>  at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>>>  at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>>>  at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
>>>  at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>>>  at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>>>  at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>>>  at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>>>  at
>> org.eclipse.je

Re: Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-01 Thread Aroop Ganguly
Turns out the cause was multiple indexing jobs indexing into the index 
simultaneously, which one can imagine can cause jvm loads on certain replicas 
for sure.
Once this was found and only one job ran at a time, things were back to normal.

Your comments seem right on no correlation to the stack trace! 

> On Apr 1, 2019, at 5:32 PM, Shawn Heisey  wrote:
> 
> 4/1/2019 5:40 PM, Aroop Ganguly wrote:
>> Thanks Shawn, for the initial response.
>> Digging into a bit, I was wondering if we’d care to read the inner most 
>> stack.
>> From the inner most stack it seems to be telling us something about what 
>> trigger it ?
>> Ofcourse, the system could have been overloaded as well, but is the 
>> exception telling us something or its of no use to consider this stack
> 
> The stacktrace on OOME is rarely useful.  The memory allocation where the 
> error is thrown probably has absolutely no connection to the part of the 
> program where major amounts of memory are being used.  It could be ANY memory 
> allocation that actually causes the error.
> 
> Thanks,
> Shawn



Re: Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-01 Thread Aroop Ganguly
Thanks Shawn, for the initial response.
Digging into a bit, I was wondering if we’d care to read the inner most stack.

From the inner most stack it seems to be telling us something about what 
trigger it ?
Ofcourse, the system could have been overloaded as well, but is the exception 
telling us something or its of no use to consider this stack


Caused by: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:413)
at 
org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1609)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1601)
at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:964)
at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:341)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:288)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:235)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:970)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1186)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:653)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)



> On Apr 1, 2019, at 4:06 PM, Shawn Heisey  wrote:
> 
> On 4/1/2019 4:44 PM, Aroop Ganguly wrote:
>> I am facing this issue again.The stack mentions Heap space issue.
>> Are the document sizes too big ?
>> Not sure what I should be doing here; As on the solr admin ui I do not see 
>> jvm being anywhere close to being full.
>> Any advise on this is greatly welcome.
> 
> 
> 
>> Caused by: java.lang.OutOfMemoryError: Java heap space
> 
> Java ran out of heap space.  This means that for what that process is being 
> asked to do, its heap is too small.  Solr needs more memory than it is 
> allowed to use.
> 
> There are exactly two things you can do.
> 
> 1) Increase the heap size.
> 2) Change something so that less heap is required.
> 
> The second option is not always possible.
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
> 
> Program operation is completely unpredictable when OOME strikes.  This is why 
> Solr is configured to self-destruct on OutOfMemoryError when it is running on 
> a non-Windows operating system.  We'd like the same thing to happen for 
> Windows, but don't have that capability yet.
> 
> Thanks,
> Shawn



Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-01 Thread Aroop Ganguly



Hi Group 

I am facing this issue again.The stack mentions Heap space issue.

Are the document sizes too big ?

Not sure what I should be doing here; As on the solr admin ui I do not see jvm 
being anywhere close to being full.
Any advise on this is greatly welcome.


Full Stack trace:

2019-04-01 22:13:54.833 ERROR (qtp484199463-773) 
 o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Server error 
writing document id C9C280C4-B3B7-4BEE-9EA5-C4925F5092D9 to the index
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:970)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1186)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:653)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler

Solr 7.5 - Indexing Failing due to "IndexWriter is Closed"

2019-04-01 Thread Aroop Ganguly
Hi Group 

I am facing this issue again.The stack mentions Heap space issue.

Are the document sizes too big ?

Not sure what I should be doing here; As on the solr admin ui I do not see jvm 
being anywhere close to being full.
Any advise on this is greatly welcome.


Full Stack trace:

2019-04-01 22:13:54.833 ERROR (qtp484199463-773) 
 o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Server error 
writing document id C9C280C4-B3B7-4BEE-9EA5-C4925F5092D9 to the index
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:970)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1186)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:653)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.ja

Re: IndexWriter has closed

2019-03-29 Thread Aroop Ganguly
Trying again .. Any idea why this might happen?


> On Mar 27, 2019, at 10:43 PM, Aroop Ganguly  wrote:
> 
> Hi Everyone
> 
> My indexing jobs are failing with “this IndexWriter has closed” errors..
> This is a solr 7.5 setup, with an NRT index.
> 
> In deeper logs I see, some of these exceptions,
> Any idea what could have caused this ?
> 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
> java.io.IOException: Input/output error
>   at 
> org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:477)
>   at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:833)
>   at org.apache.solr.update.UpdateLog.preCommit(UpdateLog.java:817)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:669)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:93)
>   at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1959)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1935)
>   at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
>   at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:531)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.uti

IndexWriter has closed

2019-03-27 Thread Aroop Ganguly
Hi Everyone

My indexing jobs are failing with “this IndexWriter has closed” errors..
This is a solr 7.5 setup, with an NRT index.

In deeper logs I see, some of these exceptions,
Any idea what could have caused this ?

o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
java.io.IOException: Input/output error
at 
org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:477)
at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:833)
at org.apache.solr.update.UpdateLog.preCommit(UpdateLog.java:817)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:669)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:93)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1959)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1935)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thr

Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
Thanks Shalin, Shawn.

I ended up getting guidance from Anshum on this and we did indeed use the 
delete-replica api to delete all but one of the replicas, and bouncing the last 
replica  to let it lead.

I will let anshum share a post on the details of how to recover leader shards.

> On Mar 14, 2019, at 7:09 PM, Shalin Shekhar Mangar  
> wrote:
> 
> What Shawn said.
> 
> DeleteShard API is supposed to be used either when using implicit routing
> or when you have compositeId router but the shard has already been split
> and therefore in an inactive state.
> 
> Delete Replica API is what you need if you want to delete an individual
> replica.
> 
> On Fri, Mar 15, 2019 at 12:39 AM Shawn Heisey  wrote:
> 
>> On 3/14/2019 12:47 PM, Aroop Ganguly wrote:
>>> I am trying to delete a shard from a collection using the collections
>>> api for the same.
>>> On the solr ui,  all the replicas are in “downed” state.
>>> 
>>> However, when I run the delete shard
>>> 
>> command: 
>> /solr/admin/collections?action=DELETESHARD&collection=x&shard=shard84
>>> I get this exception:
>>> {
>>>   "responseHeader":{
>>> "status":400,
>>> "QTime":14},
>>>   "Operation deleteshard caused
>>> 
>> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> 
>>> The slice: shard35 is currently active. Only non-active
>>> (or custom-hashed) slices can be deleted.",
>> 
>> 
>> 
>>> Why is this api thinking this slice is active ? When the Solr UI shows
>>> all replicas down ?
>> 
>> Active means the shard is considered part of the whole collection --
>> included when you run a query, etc.
>> 
>> Even though all replicas are down, the shard is still an active part of
>> the index.  So you can't delete it.
>> 
>> If your collection is typical and has compositeId routing, deleting a
>> shard is really only possible after you have run SPLITSHARD and then you
>> will only be able to delete the original shard that gets split.
>> 
>> Aside from SPLITSHARD, I really have no idea how to mark a shard as
>> inactive, but that will be required before you can delete it.
>> 
>> Thanks,
>> Shawn
>> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
correction:

Thanks Shalin, Shawn.

I ended up getting guidance from Anshum on this and we did indeed use the 
delete-replica api to delete all but one of the replicas, and bouncing the last 
replica  to let it lead.

I will let anshum share a post on the details of how to recover leaderless 
shards with all replicas inactive in state.

> On Mar 14, 2019, at 8:01 PM, Aroop Ganguly  wrote:
> 
> Thanks Shalin, Shawn.
> 
> I ended up getting guidance from Anshum on this and we did indeed use the 
> delete-replica api to delete all but one of the replicas, and bouncing the 
> last replica  to let it lead.
> 
> I will let anshum share a post on the details of how to recover leader shards.



Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
Hi All

I am trying to delete a shard from a collection using the collections api for 
the same.
On the solr ui,  all the replicas are in “downed” state. 

However, when I run the delete shard command: 
/solr/admin/collections?action=DELETESHARD&collection=x&shard=shard84
I get this exception:
{
  "responseHeader":{
"status":400,
"QTime":14},
  "Operation deleteshard caused 
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 The slice: shard35 is currently active. Only non-active (or custom-hashed) 
slices can be deleted.",
  "exception":{
"msg":"The slice: shard35 is currently active. Only non-active (or 
custom-hashed) slices can be deleted.",
"rspCode":400},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"The slice: shard84 is currently active. Only non-active (or 
custom-hashed) slices can be deleted.",
"code":400}}


Why is this api thinking this slice is active ? When the Solr UI shows all 
replicas down ?



Thanks
Aroop



Re: solr 7 optimize with Tlog/Pull replicas

2019-03-13 Thread Aroop Ganguly
Thanks Erick ! Great details as always :)

> On Mar 13, 2019, at 8:48 AM, Erick Erickson  wrote:
> 
> Wei:
> 
> Right. You should count on the _entire_ index being replicated from the 
> leader, but only after the optimize is done. Pre 7.5, this would be a single 
> segment, 7.5+ it would be a bunch of 5G flies unless you specified that the 
> optimize create some number of segments.
> 
> But unless you
> 1> have an unreasonable number of deleted docs in your index
> or
> 2> can demonstrate improved speed after optimize (and are willing to do it 
> regularly)
> 
> I wouldn’t bother.
> 
> Aroop:
> 
> Well, optimizing is really never recommended if you can help it ;). By “help 
> it” here I mean the number of deleted documents is a “reasonable” percentage 
> of your index, where _you_ define what “reasonable” means. Another bit that 
> came along with Solr 7.5 is that the percentage of deleted documents should 
> be smaller than pre 7.5 in some cases.
> 
> It was relatively easy, for instance, to have indexes approaching 50% deleted 
> documents pre 7.5. Things had to happen “just right” for that case, but it 
> was possible.
> 
> When bulk indexing for instance, if what you’re doing is replacing all the 
> docs you should have a minuscule number of deleted docs and I wouldn’t bother.
> 
> As always, if you can demonstrate that an optimized index returns searches 
> enough faster to matter in your particular situation, then the cost may be 
> worth it. And the situation where it makes the most sense is situations where 
> you can optimize regularly.
> 
> Best,
> Erick
> 
>> On Mar 12, 2019, at 10:51 PM, Aroop Ganguly 
>>  wrote:
>> 
>> Hi Erick
>> 
>> A related question: 
>> 
>> Is optimize then ill advised for bulk indexer post solr 7.5 ? 
>>>> Especially in a situation where an index is being modified over many days ?
>> 
>> Thanks
>> Aroop
>> 
>>> On Mar 12, 2019, at 9:30 PM, Wei  wrote:
>>> 
>>> Thanks Erick, it's very helpful.  So for bulking indexing in a Tlog or
>>> Tlog/Pull cloud,  when we optimize at the end of updates, segments on the
>>> leader replica will change rapidly and the follower replicas will be
>>> continuously pulling from the leader, effectively downloading the whole
>>> index.  Is there a more efficient way?
>>> 
>>> On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson 
>>> wrote:
>>> 
>>>> do _not_ turn of hard commits, even when bulk indexing. Set the
>>>> OpenSeacher to false in your config. This is for two reasons:
>>>> 1> the only time the transaction log is rolled over is when a hard commit
>>>> happens. If you turn off commits it’ll grow to a very large size.
>>>> 2> If, for any reason, the node restarts, it’ll replay the transaction log
>>>> from the last hard commit point, potentially taking hours if you haven’t
>>>> committed.
>>>> 
>>>> And you should probably open  a new searcher occasionally, even while bulk
>>>> indexing. For Real Time Get there are some internal structures that grow in
>>>> proportion to the docs indexed since the last searcher was opened.
>>>> 
>>>> And for your other quesitons:
>>>> <1> I believe so, try it and look at your solr log.
>>>> 
>>>> <2> Yes. Have you looked at Mike’s video (the third one down) here:
>>>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html?
>>>> TieredMergePolicy is the third video. The merge policy combines like-sized
>>>> segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so
>>>> having multiple segments < 20G is perfectly normal.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Mar 10, 2019, at 10:36 PM, Wei  wrote:
>>>>> 
>>>>> A side question, for heavy bulk indexing, what's the recommended setting
>>>>> for auto commit? As there is no query needed during the bulking indexing
>>>>> process, I have auto soft commit disabled. Is there any side effect if I
>>>>> also disable auto commit?
>>>>> 
>>>>> On Sun, Mar 10, 2019 at 10:22 PM Wei  wrote:
>>>>> 
>>>>>> Thanks Erick.
>>>>>> 
>>>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should
>>>> optimize
>>>>>> on the leader then replicate the entire index to the follower.
>>

Re: solr 7 optimize with Tlog/Pull replicas

2019-03-13 Thread Aroop Ganguly
Hi Erick

A related question: 

Is optimize then ill advised for bulk indexer post solr 7.5 ? 
>> Especially in a situation where an index is being modified over many days ?

Thanks
Aroop

> On Mar 12, 2019, at 9:30 PM, Wei  wrote:
> 
> Thanks Erick, it's very helpful.  So for bulking indexing in a Tlog or
> Tlog/Pull cloud,  when we optimize at the end of updates, segments on the
> leader replica will change rapidly and the follower replicas will be
> continuously pulling from the leader, effectively downloading the whole
> index.  Is there a more efficient way?
> 
> On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson 
> wrote:
> 
>> do _not_ turn of hard commits, even when bulk indexing. Set the
>> OpenSeacher to false in your config. This is for two reasons:
>> 1> the only time the transaction log is rolled over is when a hard commit
>> happens. If you turn off commits it’ll grow to a very large size.
>> 2> If, for any reason, the node restarts, it’ll replay the transaction log
>> from the last hard commit point, potentially taking hours if you haven’t
>> committed.
>> 
>> And you should probably open  a new searcher occasionally, even while bulk
>> indexing. For Real Time Get there are some internal structures that grow in
>> proportion to the docs indexed since the last searcher was opened.
>> 
>> And for your other quesitons:
>> <1> I believe so, try it and look at your solr log.
>> 
>> <2> Yes. Have you looked at Mike’s video (the third one down) here:
>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html?
>> TieredMergePolicy is the third video. The merge policy combines like-sized
>> segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so
>> having multiple segments < 20G is perfectly normal.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 10, 2019, at 10:36 PM, Wei  wrote:
>>> 
>>> A side question, for heavy bulk indexing, what's the recommended setting
>>> for auto commit? As there is no query needed during the bulking indexing
>>> process, I have auto soft commit disabled. Is there any side effect if I
>>> also disable auto commit?
>>> 
>>> On Sun, Mar 10, 2019 at 10:22 PM Wei  wrote:
>>> 
 Thanks Erick.
 
 1> TLOG replicas shouldn’t optimize on the follower. They should
>> optimize
 on the leader then replicate the entire index to the follower.
 
 Does that mean the follower will ignore the optimize request? Or shall I
 send the optimize request only to one of the leaders?
 
 2> As of Solr 7.5, optimize should not optimize to a single segment
 _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
 numSegments on the optimize command.
 
 -- Is the 5G limit controlled by maxMegedSegmentMB setting? In
 solrconfig.xml I used these settings:
 
 > class="org.apache.solr.index.TieredMergePolicyFactory">
  100
  10
  10
  20480
 
 
 But in the end I see multiple segments much smaller than the 20GB limit.
 In 7.6 is it required to explicitly set the number of segments to 1? e.g
 shall I use
 
 /update?optimize=true&waitSearcher=false&maxSegments=1
 
 Best,
 Wei
 
 
 On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson >> 
 wrote:
 
> This is very odd for at least two reasons:
> 
> 1> TLOG replicas shouldn’t optimize on the follower. They should
>> optimize
> on the leader then replicate the entire index to the follower.
> 
> 2> As of Solr 7.5, optimize should not optimize to a single segment
> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
> numSegments on the optimize command.
> 
> So if you can reliably reproduce this, it’s probably worth a JIRA…...
> 
>> On Mar 8, 2019, at 11:21 AM, Wei  wrote:
>> 
>> Hi,
>> 
>> RecentIy I encountered a strange issue with optimize in Solr 7.6. The
> cloud
>> is created with 4 shards with 2 Tlog replicas per shard. After batch
> index
>> update I issue an optimize command to a randomly picked replica in the
>> cloud.  After a while when I check,  all the non-leader Tlog replicas
>> finished optimization to a single segment, however all the leader
> replicas
>> still have multiple segments.  Previously inn the all NRT replica
> cloud, I
>> see optimization is triggered on all nodes.  Is the optimization
>> process
>> different with Tlog/Pull replicas?
>> 
>> Best,
>> Wei
> 
> 
>> 
>> 



Re: Hide BasicAuth JVM param on SOLR admin UI

2019-03-06 Thread Aroop Ganguly
try changing the passwords using the auth api 
https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin.html#BasicAuthenticationPlugin-AddaUserorEditaPassword
 


That point onwards your credentials will be encrypted on the admin ui.
I do not think your -DbasicAuth password will change but your actual password 
would be different and base64 encrypted.


> On Mar 6, 2019, at 12:22 AM, el mas capo  wrote:
> 
> Hi everyone,
> I am trying to configure Cloud Solr(7.7.0) with basic Authentification. All  
> seems to work nicely, but when I enter on the Web UI I can see the basic Auth 
> Password configured in solr.in.sh in clear format:
> -Dbasicauth=solr:SolrRocks
> Can this behaviour be avoided?
> Thank you by your attention.
> 



Re: Rule Based Auth - Permission to run select queries

2019-03-05 Thread Aroop Ganguly
I guess just mentioning path=“/select” is sufficient…

The documentation does not explain this .. 

> On Mar 5, 2019, at 7:36 PM, Aroop Ganguly  wrote:
> 
> Turns out I had to specify the path param to /select, while setting the 
> permission.
> 
> But this is random..I created a new permission and assigned it to the same 
> user, and now the user with this role is able to get data.
> 
> "set-permission": {"name": "read-collections", 
> "role":"readonly","path":"/select"}
> 
> How does this work ? Is there actually a permission called read-collections? 
> 
> 
>> On Mar 5, 2019, at 7:08 PM, Aroop Ganguly > <mailto:aroopgang...@icloud.com>> wrote:
>> 
>> Hi Team
>> 
>> I am playing around with rule based auth and I wanted to create a role which 
>> is readonly.
>> I gave the “read” permission to the role, but I am not able to get data from 
>> the /select handler.
>> A simple select request results in this response:
>> 
>> 
>> 
>> 
>> Error 403 Unauthorized request, Response code: 403
>> 
>> HTTP ERROR 403
>> Problem accessing /solr/my_collection/select. Reason:
>> Unauthorized request, Response code: 403
>> 
>> 
>> 
>> What permission must I give to this role to let it have non-update read 
>> access on solr endpoints?
>> 
>> I went through the list of permissions listed here: 
>> https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions
>>  
>> <https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions>
>>  
>> <https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions
>>  
>> <https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions>>,
>>  
>> but I cannot imagine this being an exhaustive list; be that as it may I 
>> thought “read” seemed to be the right permission.
>> 
>> Please advise.
>> 
>> Thanks
>> Aroop
> 



Re: Rule Based Auth - Permission to run select queries

2019-03-05 Thread Aroop Ganguly
Turns out I had to specify the path param to /select, while setting the 
permission.

But this is random..I created a new permission and assigned it to the same 
user, and now the user with this role is able to get data.

"set-permission": {"name": "read-collections", 
"role":"readonly","path":"/select"}

How does this work ? Is there actually a permission called read-collections? 


> On Mar 5, 2019, at 7:08 PM, Aroop Ganguly  wrote:
> 
> Hi Team
> 
> I am playing around with rule based auth and I wanted to create a role which 
> is readonly.
> I gave the “read” permission to the role, but I am not able to get data from 
> the /select handler.
> A simple select request results in this response:
> 
> 
> 
> 
> Error 403 Unauthorized request, Response code: 403
> 
> HTTP ERROR 403
> Problem accessing /solr/my_collection/select. Reason:
> Unauthorized request, Response code: 403
> 
> 
> 
> What permission must I give to this role to let it have non-update read 
> access on solr endpoints?
> 
> I went through the list of permissions listed here: 
> https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions
>  
> <https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions>,
>  
> but I cannot imagine this being an exhaustive list; be that as it may I 
> thought “read” seemed to be the right permission.
> 
> Please advise.
> 
> Thanks
> Aroop



Re: RuleBasedAuthorizationPlugin configuration

2019-03-05 Thread Aroop Ganguly
Hi Dominique

Were you able to resolve this ?
I am also stuck with understanding a minimal permission-set to give to a 
readonly user to read from the /select endpoint.

Regards
Aroop


> On Jan 1, 2019, at 11:23 PM, Dominique Bejean  
> wrote:
> 
> Hi,
> 
> I created a Jira issue
> https://issues.apache.org/jira/browse/SOLR-13097
> 
> Regards.
> 
> Dominique
> 
> 
> Le lun. 31 déc. 2018 à 11:26, Dominique Bejean 
> a écrit :
> 
>> Hi,
>> 
>> In debugging mode, I discovered that only in SolrCloud mode the collection
>> name is extract from the request path in the init() method of
>> HttpSolrCall.java
>> 
>>   if (cores.isZooKeeperAware()) {
>>  // init collectionList (usually one name but not when there are
>> aliases)
>>  ...
>>}
>> 
>> So in Solr standalone mode, only authentication is fully fonctionnal, not
>> authorization !
>> 
>> Regards.
>> 
>> Dominique
>> 
>> 
>> 
>> 
>> 
>> Le dim. 30 déc. 2018 à 13:40, Dominique Bejean 
>> a écrit :
>> 
>>> Hi,
>>> 
>>> After reading more carefully the log file, here is my understanding.
>>> 
>>> The request
>>> 
>>> http://2:xx@localhost:8983/solr/biblio/select?indent=on&q=*:*&wt=json
>>> 
>>> 
>>> report this in log
>>> 
>>> 2018-12-30 12:24:52.102 INFO  (qtp1731656333-20) [   x:biblio]
>>> o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context :
>>> userPrincipal: [[principal: 2]] type: [READ], collections: [], Path:
>>> [/select] path : /select params :q=*:*&indent=on&wt=json
>>> 
>>> collections is empty, so it looks like "/select" is not collection
>>> specific and so it is not possible to define read access by collection.
>>> 
>>> Can someone confirm ?
>>> 
>>> Regards
>>> 
>>> Dominique
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le ven. 21 déc. 2018 à 10:46, Dominique Bejean 
>>> a écrit :
>>> 
 Hi,
 
 I am trying to configure security.json file, in order to define the
 following users and permissions :
 
   - user "admin" with all permissions on all collections
   - user "read" with read  permissions  on all collections
   - user "1" with only read  permissions  on biblio collection
   - user "2" with only read  permissions  on personnes collection
 
 Here is my security.json file
 
 {
  "authentication":{
"blockUnknown":true,
"class":"solr.BasicAuthPlugin",
"credentials":{
  "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
  "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
 gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
  "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
 gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
  "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
 gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
"":{"v":0}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
  {
"name":"all",
"role":"admin",
"index":1},
  {
"name":"read-biblio",
"path":"/select",
"role":["admin","read","r1"],
"collection":"biblio",
"index":2},
  {
"name":"read-personnes",
"path":"/select",
"role":["admin","read","r2"],
"collection":"personnes",
"index":3},
 {
"name":"read",
"collection":"*",
"role":["admin","read"],
"index":4}],
"user-role":{
  "admin":"admin",
  "read":"read",
  "1":"r1",
  "2":"r2"}
  }
 }
 
 
 I have a 403 errors for user 1 on biblio and user 2 on personnes while
 using the "/select" requestHandler. However according to r1 and r2 roles
 and premissions order, the access should be allowed.
 
 I have duplicated the TestRuleBasedAuthorizationPlugin.java class in
 order to test these exact same permissions and roles. checkRules reports
 access is allowed !!!
 
 I don't understand where is the problem. Any ideas ?
 
 Regards
 
 Dominique
 
 
 
 
 
 
 
 



Rule Based Auth - Permission to run select queries

2019-03-05 Thread Aroop Ganguly
Hi Team

I am playing around with rule based auth and I wanted to create a role which is 
readonly.
I gave the “read” permission to the role, but I am not able to get data from 
the /select handler.
A simple select request results in this response:




Error 403 Unauthorized request, Response code: 403

HTTP ERROR 403
Problem accessing /solr/my_collection/select. Reason:
Unauthorized request, Response code: 403



What permission must I give to this role to let it have non-update read access 
on solr endpoints?

I went through the list of permissions listed here: 
https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html#Rule-BasedAuthorizationPlugin-PredefinedPermissions
 
,
 
but I cannot imagine this being an exhaustive list; be that as it may I thought 
“read” seemed to be the right permission.

Please advise.

Thanks
Aroop 

Re: Unable to create collection with custom queryParser Plugin

2019-02-11 Thread Aroop Ganguly
Thanks Erick, Jörn danke sehr.

tldr;
gradle trickery and thriftiness helped here.

detail:
To make things easier, for our deployment systems, I created a plugin gradle 
task which is economical and yet brings in the right number of jars.
in my case scala-lang jars were required, everything else was compile only. I 
used the compileOnly gradle dependency directive for all dependencies except 
scala-lang.
also had to remove shading.



> On Feb 10, 2019, at 11:37 PM, Erick Erickson  wrote:
> 
> What Jörn said.
> 
> Your jar should be nothing but your custom code. Usually I cheat in
> IntelliJ and check the box for artifacts that says something like
> "only compiled output"
> 
> On Sun, Feb 10, 2019 at 10:37 PM Jörn Franke  wrote:
>> 
>> You can put all solr dependencies as provided. They are already on the class 
>> path - no need to put them in the fat jar.
>> 
>>> Am 11.02.2019 um 05:59 schrieb Aroop Ganguly :
>>> 
>>> Thanks Erick!
>>> 
>>> I see. Yes it is a fat jar post shadowJar process (in the order of MBs).
>>> It contains solrj and solr-core dependencies plus a few more scala related 
>>> ones.
>>> I guess the solr-core dependencies are unavoidable (right ?), let me try to 
>>> trim the others.
>>> 
>>> Regards
>>> Aroop
>>> 
>>>> On Feb 10, 2019, at 8:44 PM, Erick Erickson  
>>>> wrote:
>>>> 
>>>> Aroop:
>>>> 
>>>> How big is your custom jar file? The name "test-plugins-aroop-all.jar"
>>>> makes me suspicious. It should be very small and should _not_ contain
>>>> any of the Solr distribution jar files, just your compiled custom
>>>> code. I'm grasping at straws a bit, but it may be that you have the
>>>> same jar files from the Solr distro and also included in your custom
>>>> jar and it's confusing the classloader. "Very small" here is on the
>>>> order of 10K given it does very little. If it's much bigger than, say,
>>>> 15K it's a red flag. If you do a "jar -dvf your_custom_jar" there
>>>> should be _very_ few classes in it.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Sun, Feb 10, 2019 at 8:33 PM Aroop Ganguly
>>>>  wrote:
>>>>> 
>>>>> [resending due to bounce warning from the other email]
>>>>> 
>>>>> 
>>>>> Hi Team
>>>>> 
>>>>> I thought this was simple, but I am just missing something here. Any 
>>>>> guidance would be very appreciated.
>>>>> 
>>>>> What have I done so far:
>>>>>  1. I have created a custom querParser (class SamplePluggin extends 
>>>>> QParserPlugin { ), which right now does nothing but logs an info message, 
>>>>> and returns a new LuceneQParser() instance with the same parameters.
>>>>>  2. I am on solr 7.5 and I have added the path to the jar and 
>>>>> referenced the plugin in the following ways in my solrconfig.xml:
>>>>> 
>>>>>  
>>>>>  >>>> class="com.aroop.plugins.SamplePluggin"/>
>>>>> 
>>>>> Now when I create a collection with this solrconfig, I keep getting this 
>>>>> exception stack:
>>>>> I have tried debugging the live solr instance and for the life of me, I 
>>>>> cannot understand why am I getting this cast exception
>>>>> 2019-02-11 03:57:10.410 ERROR (qtp1594873248-62) [c:cvp2 s:shard1 
>>>>> r:core_node2 x:testCollection_shard1_replica_n1] 
>>>>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
>>>>> CREATEing SolrCore 'testCollection_shard1_replica_n1': Unable to create 
>>>>> core [testCollection_shard1_replica_n1] Caused by: class 
>>>>> com.aroop.plugins.SamplePluggin
>>>>>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
>>>>>  at 
>>>>> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$247(CoreAdminOperation.java:92)
>>>>>  at 
>>>>> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
>>>>>  at 
>>>>> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
>>>>>  at 
>>>>> org.apache.solr.

Re: Unable to create collection with custom queryParser Plugin

2019-02-10 Thread Aroop Ganguly
Thanks Erick!

I see. Yes it is a fat jar post shadowJar process (in the order of MBs).
It contains solrj and solr-core dependencies plus a few more scala related ones.
I guess the solr-core dependencies are unavoidable (right ?), let me try to 
trim the others.

Regards
Aroop

> On Feb 10, 2019, at 8:44 PM, Erick Erickson  wrote:
> 
> Aroop:
> 
> How big is your custom jar file? The name "test-plugins-aroop-all.jar"
> makes me suspicious. It should be very small and should _not_ contain
> any of the Solr distribution jar files, just your compiled custom
> code. I'm grasping at straws a bit, but it may be that you have the
> same jar files from the Solr distro and also included in your custom
> jar and it's confusing the classloader. "Very small" here is on the
> order of 10K given it does very little. If it's much bigger than, say,
> 15K it's a red flag. If you do a "jar -dvf your_custom_jar" there
> should be _very_ few classes in it.
> 
> Best,
> Erick
> 
> On Sun, Feb 10, 2019 at 8:33 PM Aroop Ganguly
>  wrote:
>> 
>> [resending due to bounce warning from the other email]
>> 
>> 
>> Hi Team
>> 
>> I thought this was simple, but I am just missing something here. Any 
>> guidance would be very appreciated.
>> 
>> What have I done so far:
>>1. I have created a custom querParser (class SamplePluggin extends 
>> QParserPlugin { ), which right now does nothing but logs an info message, 
>> and returns a new LuceneQParser() instance with the same parameters.
>>2. I am on solr 7.5 and I have added the path to the jar and 
>> referenced the plugin in the following ways in my solrconfig.xml:
>> 
>>
>>> class="com.aroop.plugins.SamplePluggin"/>
>> 
>> Now when I create a collection with this solrconfig, I keep getting this 
>> exception stack:
>> I have tried debugging the live solr instance and for the life of me, I 
>> cannot understand why am I getting this cast exception
>> 2019-02-11 03:57:10.410 ERROR (qtp1594873248-62) [c:cvp2 s:shard1 
>> r:core_node2 x:testCollection_shard1_replica_n1] o.a.s.h.RequestHandlerBase 
>> org.apache.solr.common.SolrException: Error CREATEing SolrCore 
>> 'testCollection_shard1_replica_n1': Unable to create core 
>> [testCollection_shard1_replica_n1] Caused by: class 
>> com.aroop.plugins.SamplePluggin
>>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
>>at 
>> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$247(CoreAdminOperation.java:92)
>>at 
>> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
>>at 
>> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
>>at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
>>at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>>at 
>> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
>>at 
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
>>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>>at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>>at 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>>at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>>at 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>at 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>>at 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>>at 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>>at 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>>at 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>>at 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>>at 
>> org.eclipse.jetty.servlet.ServletHa

Unable to create collection with custom queryParser Plugin

2019-02-10 Thread Aroop Ganguly
adExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[testCollection_shard1_replica_n1]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1159)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1048)
... 44 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
queryParser, com.aroop.plugins.SamplePluggin failed to instantiate 
org.apache.solr.search.QParserPlugin
at org.apache.solr.core.SolrCore.(SolrCore.java:1014)
at org.apache.solr.core.SolrCore.(SolrCore.java:869)
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1138)
... 45 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
queryParser, com.aroop.plugins.SamplePluggin failed to instantiate 
org.apache.solr.search.QParserPlugin
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:813)
at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:141)
at org.apache.solr.core.PluginBag.init(PluginBag.java:277)
at org.apache.solr.core.PluginBag.init(PluginBag.java:266)
at org.apache.solr.core.SolrCore.(SolrCore.java:963)
... 47 more
Caused by: java.lang.ClassCastException: class com.aroop.plugins.SamplePluggin
at java.lang.Class.asSubclass(Class.java:3404)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:541)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:792)
... 51 more

 Aroop Ganguly
Siri Data | Metrics Platform



Unable to create collection with custom queryParser Plugin

2019-02-10 Thread Aroop Ganguly
Hi Team

I thought this was simple, but I am just missing something here. Any guidance 
would be very appreciated.

What have I done so far:
1. I have created a custom querParser (class SamplePluggin extends 
QParserPlugin { ), which right now does nothing but logs an info message, and 
returns a new LuceneQParser() instance with the same parameters.
2. I am on solr 7.5 and I have added the path to the jar and referenced 
the plugin in the following ways in my solrconfig.xml:




Now when I create a collection with this solrconfig, I keep getting this 
exception stack:
I have tried debugging the live solr instance and for the life of me, I cannot 
understand why am I getting this cast exception
2019-02-11 03:57:10.410 ERROR (qtp1594873248-62) [c:cvp2 s:shard1 r:core_node2 
x:testCollection_shard1_replica_n1] o.a.s.h.RequestHandlerBase 
org.apache.solr.common.SolrException: Error CREATEing SolrCore 
'testCollection_shard1_replica_n1': Unable to create core 
[testCollection_shard1_replica_n1] Caused by: class 
com.aroop.plugins.SamplePluggin
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$247(CoreAdminOperation.java:92)
at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at

Re: streaming expressions substring-evaluator

2018-10-31 Thread Aroop Ganguly
Thanks for the note Joel.


> On Oct 31, 2018, at 5:55 AM, Joel Bernstein  wrote:
> 
> The replace operator is going to be "replaced" :)
> 
> Let's create an umbrella ticket for string operations and list out what
> would be nice to have. They can probably be added very quickly.
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Wed, Oct 31, 2018 at 8:49 AM Gus Heck  wrote:
> 
>> Probably ReplaceWithSubstringOperation (similar to
>> ReplaceWithFieldOperation thought that would probably add another class be
>> subject to https://issues.apache.org/jira/browse/SOLR-9661)
>> 
>> On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein  wrote:
>> 
>>> I don't think there is a substring or similar function. This would be
>> quite
>>> nice to add along with other string manipulations.
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> 
>>> On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
>>> wrote:
>>> 
>>>> Hey Team
>>>> 
>>>> 
>>>> Is there a way to extract a part of a string field and group by on it
>> and
>>>> obtain a histogram ?
>>>> 
>>>> for example the filed value is DateTime of the form: 20180911T00 and
>>>> I want to do a substring like substring(field1,0,7), and then do a
>>>> streaming expression of the form :
>>>> 
>>>> rollup(
>>>>select(
>>>> search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7)
>> as
>>>> date)
>>>>   ,on= date, count(*)
>>>> )
>>>> 
>>>> Is there a substring operator available or an alternate in streaming
>>>> expressions?
>>>> 
>>>> Thanks
>>>> Aroop
>>> 
>> 
>> 
>> --
>> http://www.the111shift.com
>> 



streaming expressions substring-evaluator

2018-10-30 Thread Aroop Ganguly
Hey Team


Is there a way to extract a part of a string field and group by on it and 
obtain a histogram ?

for example the filed value is DateTime of the form: 20180911T00 and 
I want to do a substring like substring(field1,0,7), and then do a streaming 
expression of the form :

rollup(
select(
 search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7) as date)
   ,on= date, count(*)
)

Is there a substring operator available or an alternate in streaming 
expressions?

Thanks
Aroop

Re: Silk from LucidWorks

2018-07-15 Thread Aroop Ganguly
How do you use Grafana with Solr ? Did you build a http communication interface 
or is there some open source project that you leveraged ?


> On Jul 15, 2018, at 2:54 PM, Rahul Singh  wrote:
> 
> Their commercial offering still has something like it. You can always try 
> Grafana
> 
> Rahul
> On Jul 13, 2018, 9:59 AM -0400, rgummadi , wrote:
>> Is SiLK from LucidWorks still an acitve project. I looked at their github and
>> it does not seem to be active. If so are there any alternative solutions.
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Text Similarity

2018-07-15 Thread Aroop Ganguly
Thanks for your answer Rahul. I think I have explained similarity with the 
example, assuming the natural order.
I would assume this is a common action for people who use solr and do search 
based systems.
I am basically looking for any design patterns that people use to achieve the 
results as explained in the example below.

Please do not take join very literally. It has to be a smart join and I think 
yours approach seems like a step towards vectorizing each name. Thanks.

Are there any other ways that people have tackled such problems ?


> On Jul 15, 2018, at 2:51 PM, Rahul Singh  wrote:
> 
> How do you define similarity? There are various different methods that work 
> for different methods. In solr depending on which index time analyzer / 
> tokenizer you are using, it will treat one company name as similar in one 
> scenario and not in another.
> 
> This seems like a case of data deduplication — the join I’m pretty sure works 
> on exact matches.
> 
> Consider creating a “identity” collection where you map the different names 
> to a unique identity key. This could then be technically be joined on two 
> datasets and then those could be joined again.
> 
> Rahul
> On Jul 11, 2018, 4:42 PM -0400, Aroop Ganguly 
> , wrote:
>> Hi Team
>> 
>> This is what I want to do:
>> 1. I have 2 datasets of the schema id-number and company-name
>> 2. I want to ultimately be able to link (join or any other means) the 2 data 
>> sets based on the similarity between the company-name fields of the 2 data 
>> set.
>> 
>> Example:
>> 
>> Dataset 1
>> 
>> Id | Company Name
>> —| —
>> 1 | Aroop Inc
>> 2 | Ganguly & Ganguly Corp
>> 
>> 
>> Dataset 2
>> 
>> Yo Revenue | Company Name
>> — — |
>> 1K | aroop and sons
>> 2K | Ganguly Corp
>> 3K | Ganguly and Ganguly
>> 2K | Aroop Inc.
>> 6K | Ganguly Corporation
>> 
>> 
>> 
>> I want to be able to get a join in the end, based on a smart similarity 
>> score between the company names in the 2 data sets.
>> 
>> Final Dataset
>> —--- | —| |— |
>> Id | Company Name | Revenue | Matched Company Name from Dataset2 | 
>> Similarity Score
>> —--- | —---—| — 
>> |———
>> 1 | Aroop Inc | 2K | Aroop Inc. | 99%
>> 2 | Ganguly & Ganguly Corp | 3K | Ganguly and Ganguly | 75%
>> —--- | —| |—--- |
>> 
>> How should I proceed? (I have preprocessed the data sets to lowercase it and 
>> remove non essential words like pronouns and acronyms like LTD or Co. )
>> 
>> Thanks
>> Aroop



Text Similarity

2018-07-11 Thread Aroop Ganguly
Hi Team

This is what I want to do:
1. I have 2 datasets of the schema id-number and company-name
2. I want to ultimately be able to link (join or any other means) the 2 data 
sets based on the similarity between the company-name fields of the 2 data set.

Example:

Dataset 1

Id | Company Name
—| —
1 | Aroop Inc
2 | Ganguly & Ganguly Corp


Dataset 2

Yo Revenue| Company Name
— — |
1K  | aroop and sons
2K  | Ganguly Corp
3K  | Ganguly and Ganguly
2K  | Aroop Inc.
6K  | Ganguly Corporation



I want to be able to get a join in the end, based on a smart similarity score 
between the company names in the 2 data sets.

Final Dataset
—---| —| |—   |
Id  | Company Name  |   Revenue |   Matched 
Company Name from Dataset2  |   Similarity Score
—---| —---—| —   
|———
1   | Aroop Inc |   2K  
|   Aroop Inc.  |   
99%
2   | Ganguly & Ganguly Corp|   3K  |   
Ganguly and Ganguly |   75%
—---| —| |—--- |

How should I proceed? (I have preprocessed the data sets to lowercase it and 
remove non essential words like pronouns and acronyms like LTD or Co. )

Thanks
Aroop

Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
I think there is a bug here in your query, the syntax is : 
concat(fields=\"fieldA,fieldB\", as=\"fieldABConcat\", delim=\"-\")"

Try this:

select(
search(collection1,q="*:*",fl="conceptid",sort="conceptid
asc",fq=storeid:"59c03d21d997b97bf47b3eeb",fq=schematype:"Article",fq=tags:"genetics",
qt="/export"),
conceptid as conceptid,
storeid as "test_",
concat(fields=\"conceptid,storeid\", as=\”blah\”, delim=\"-\")
)


> On Jun 27, 2018, at 2:59 PM, Pratik Patel  wrote:
> 
> Thanks Aroop,
> 
> I tired following Streaming Expression but it doesn't work for me.
> 
> select(
> search(collection1,q="*:*",fl="conceptid",sort="conceptid
> asc",fq=storeid:"59c03d21d997b97bf47b3eeb",fq=schematype:"Article",fq=tags:"genetics",
> qt="/export"),
> conceptid as conceptid,
> storeid as "test_",
> concat([conceptid,storeid], conceptid, "-")
> )
> 
> It generates an exception,  "Invalid expression
> concat([conceptid,storeid],conceptid,\"-\") - unknown operands found"
> 
> Is this correct syntax?
> 
> On Wed, Jun 27, 2018 at 4:30 PM, Aroop Ganguly 
> wrote:
> 
>> It seems like append is not available on 6.4, but concat is …
>> Check this out on the 6.4 branch:
>> https://github.com/apache/lucene-solr/blob/branch_6_4/
>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/ops/
>> ConcatOperationTest.java <https://github.com/apache/
>> lucene-solr/blob/branch_6_4/solr/solrj/src/test/org/
>> apache/solr/client/solrj/io/stream/ops/ConcatOperationTest.java>
>> 
>> 
>>> On Jun 27, 2018, at 1:27 PM, Aroop Ganguly 
>> wrote:
>>> 
>>> It should, but 6.6.* has some issues of things not working per
>> documentation.
>>> Try using 7+.
>>> 
>>>> On Jun 27, 2018, at 1:24 PM, Pratik Patel  wrote:
>>>> 
>>>> Thanks a lot for help!
>>>> 
>>>> Looks like this is a recent addition? It doesn't work for me in version
>>>> 6.6.4
>>>> 
>>>> 
>>>> 
>>>> On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly >> 
>>>> wrote:
>>>> 
>>>>> So it will become:
>>>>> select(
>>>>> search(..),
>>>>> conceptid as foo,
>>>>> storeid as bar
>>>>>append(conceptid, storeid) as id
>>>>> )
>>>>> 
>>>>> Or
>>>>> select
>>>>> select(
>>>>> search(..),
>>>>> conceptid as foo,
>>>>> storeid as bar
>>>>> ),
>>>>> foo,
>>>>> bar,
>>>>> append(foo,bar) as id
>>>>> )
>>>>> 
>>>>>> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly 
>>>>> wrote:
>>>>>> 
>>>>>> this test case here will help in understanding the usage:
>>>>>> https://github.com/apache/lucene-solr/blob/branch_7_2/
>>>>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/
>>>>> AppendEvaluatorTest.java <https://github.com/apache/
>>>>> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/
>>>>> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
>>>>>> 
>>>>>>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly 
>>>>> wrote:
>>>>>>> 
>>>>>>> I think u can use the append evaluator
>>>>>>> https://github.com/apache/lucene-solr/blob/master/solr/
>>>>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java
>> <
>>>>> https://github.com/apache/lucene-solr/blob/master/solr/
>>>>> solrj/src/java/org/apache/solr/client/solrj/io/eval/
>> AppendEvaluator.java>
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel 
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Is there a function which can be used in Streaming Expressions to
>>>>>>>> concatenate two strings? I want to use it just like add(1,2) in a
>>>>> Streaming
>>>>>>>> Expression. Essentially, I want to achieve something as follows.
>>>>>>>> 
>>>>>>>> select(
>>>>>>>> search(..),
>>>>>>>> conceptid as foo,
>>>>>>>>   storeid as bar
>>>>>>>>   concat(foo,bar) as id
>>>>>>>> )
>>>>>>>> 
>>>>>>>> I can use merge() function but my streaming expression is quite
>>>>> complex and
>>>>>>>> that will make it even more complex as that would be a round about
>> way
>>>>> of
>>>>>>>> doing it. Any idea how this can be achieved?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pratik
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 



Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
It seems like append is not available on 6.4, but concat is …
Check this out on the 6.4 branch: 
https://github.com/apache/lucene-solr/blob/branch_6_4/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/ops/ConcatOperationTest.java
 
<https://github.com/apache/lucene-solr/blob/branch_6_4/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/ops/ConcatOperationTest.java>


> On Jun 27, 2018, at 1:27 PM, Aroop Ganguly  wrote:
> 
> It should, but 6.6.* has some issues of things not working per documentation.
> Try using 7+.
> 
>> On Jun 27, 2018, at 1:24 PM, Pratik Patel  wrote:
>> 
>> Thanks a lot for help!
>> 
>> Looks like this is a recent addition? It doesn't work for me in version
>> 6.6.4
>> 
>> 
>> 
>> On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly 
>> wrote:
>> 
>>> So it will become:
>>> select(
>>> search(..),
>>> conceptid as foo,
>>>  storeid as bar
>>> append(conceptid, storeid) as id
>>> )
>>> 
>>> Or
>>> select
>>> select(
>>> search(..),
>>> conceptid as foo,
>>>  storeid as bar
>>> ),
>>> foo,
>>> bar,
>>> append(foo,bar) as id
>>> )
>>> 
>>>> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly 
>>> wrote:
>>>> 
>>>> this test case here will help in understanding the usage:
>>>> https://github.com/apache/lucene-solr/blob/branch_7_2/
>>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/
>>> AppendEvaluatorTest.java <https://github.com/apache/
>>> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/
>>> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
>>>> 
>>>>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly 
>>> wrote:
>>>>> 
>>>>> I think u can use the append evaluator
>>>>> https://github.com/apache/lucene-solr/blob/master/solr/
>>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java <
>>> https://github.com/apache/lucene-solr/blob/master/solr/
>>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java>
>>>>> 
>>>>> 
>>>>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel 
>>> wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Is there a function which can be used in Streaming Expressions to
>>>>>> concatenate two strings? I want to use it just like add(1,2) in a
>>> Streaming
>>>>>> Expression. Essentially, I want to achieve something as follows.
>>>>>> 
>>>>>> select(
>>>>>> search(..),
>>>>>> conceptid as foo,
>>>>>>storeid as bar
>>>>>>concat(foo,bar) as id
>>>>>> )
>>>>>> 
>>>>>> I can use merge() function but my streaming expression is quite
>>> complex and
>>>>>> that will make it even more complex as that would be a round about way
>>> of
>>>>>> doing it. Any idea how this can be achieved?
>>>>>> 
>>>>>> Thanks,
>>>>>> Pratik
>>>>> 
>>>> 
>>> 
>>> 
> 



Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
It should, but 6.6.* has some issues of things not working per documentation.
Try using 7+.

> On Jun 27, 2018, at 1:24 PM, Pratik Patel  wrote:
> 
> Thanks a lot for help!
> 
> Looks like this is a recent addition? It doesn't work for me in version
> 6.6.4
> 
> 
> 
> On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly 
> wrote:
> 
>> So it will become:
>> select(
>> search(..),
>> conceptid as foo,
>>   storeid as bar
>>  append(conceptid, storeid) as id
>> )
>> 
>> Or
>> select
>> select(
>> search(..),
>> conceptid as foo,
>>   storeid as bar
>> ),
>> foo,
>> bar,
>> append(foo,bar) as id
>> )
>> 
>>> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly 
>> wrote:
>>> 
>>> this test case here will help in understanding the usage:
>>> https://github.com/apache/lucene-solr/blob/branch_7_2/
>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/
>> AppendEvaluatorTest.java <https://github.com/apache/
>> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/
>> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
>>> 
>>>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly 
>> wrote:
>>>> 
>>>> I think u can use the append evaluator
>>>> https://github.com/apache/lucene-solr/blob/master/solr/
>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java <
>> https://github.com/apache/lucene-solr/blob/master/solr/
>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java>
>>>> 
>>>> 
>>>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel 
>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> Is there a function which can be used in Streaming Expressions to
>>>>> concatenate two strings? I want to use it just like add(1,2) in a
>> Streaming
>>>>> Expression. Essentially, I want to achieve something as follows.
>>>>> 
>>>>> select(
>>>>> search(..),
>>>>> conceptid as foo,
>>>>> storeid as bar
>>>>> concat(foo,bar) as id
>>>>> )
>>>>> 
>>>>> I can use merge() function but my streaming expression is quite
>> complex and
>>>>> that will make it even more complex as that would be a round about way
>> of
>>>>> doing it. Any idea how this can be achieved?
>>>>> 
>>>>> Thanks,
>>>>> Pratik
>>>> 
>>> 
>> 
>> 



Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
So it will become:
select(
search(..),
conceptid as foo,
   storeid as bar
  append(conceptid, storeid) as id
)

Or 
select
select(
search(..),
conceptid as foo,
   storeid as bar
),
foo,
bar,
append(foo,bar) as id
)

> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly  wrote:
> 
> this test case here will help in understanding the usage:
> https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java
>  
> <https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
> 
>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly  wrote:
>> 
>> I think u can use the append evaluator
>> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java
>>  
>> <https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java>
>> 
>> 
>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel  wrote:
>>> 
>>> Hello,
>>> 
>>> Is there a function which can be used in Streaming Expressions to
>>> concatenate two strings? I want to use it just like add(1,2) in a Streaming
>>> Expression. Essentially, I want to achieve something as follows.
>>> 
>>> select(
>>> search(..),
>>> conceptid as foo,
>>>  storeid as bar
>>>  concat(foo,bar) as id
>>> )
>>> 
>>> I can use merge() function but my streaming expression is quite complex and
>>> that will make it even more complex as that would be a round about way of
>>> doing it. Any idea how this can be achieved?
>>> 
>>> Thanks,
>>> Pratik
>> 
> 



Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
this test case here will help in understanding the usage:
https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java
 
<https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>

> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly  wrote:
> 
> I think u can use the append evaluator
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java
>  
> <https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java>
> 
> 
>> On Jun 27, 2018, at 12:58 PM, Pratik Patel  wrote:
>> 
>> Hello,
>> 
>> Is there a function which can be used in Streaming Expressions to
>> concatenate two strings? I want to use it just like add(1,2) in a Streaming
>> Expression. Essentially, I want to achieve something as follows.
>> 
>> select(
>> search(..),
>> conceptid as foo,
>>   storeid as bar
>>   concat(foo,bar) as id
>> )
>> 
>> I can use merge() function but my streaming expression is quite complex and
>> that will make it even more complex as that would be a round about way of
>> doing it. Any idea how this can be achieved?
>> 
>> Thanks,
>> Pratik
> 



Re: String concatenation in Streaming Expressions

2018-06-27 Thread Aroop Ganguly
I think u can use the append evaluator
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java
 



> On Jun 27, 2018, at 12:58 PM, Pratik Patel  wrote:
> 
> Hello,
> 
> Is there a function which can be used in Streaming Expressions to
> concatenate two strings? I want to use it just like add(1,2) in a Streaming
> Expression. Essentially, I want to achieve something as follows.
> 
> select(
> search(..),
> conceptid as foo,
>storeid as bar
>concat(foo,bar) as id
> )
> 
> I can use merge() function but my streaming expression is quite complex and
> that will make it even more complex as that would be a round about way of
> doing it. Any idea how this can be achieved?
> 
> Thanks,
> Pratik



Re: Java library for building Streaming Expressions

2018-06-27 Thread Aroop Ganguly
From my experience this is still limited and lot of things are broken if you 
still want to do it in a strongly typed manner.
The only reliable way is to go right now is via a string-expression building 
route for the streaming expression.

That being said, if your expressions are simple enough you could build it in a 
strongly typed manner.
@Joel may be able to advise here.

> On Jun 27, 2018, at 9:41 AM, Pratik Patel  wrote:
> 
> Hello Everyone,
> 
> Is there any java library for building Streaming Expressions? Currently, I
> am using solr's java client and building Streaming Expressions as follows.
> 
> StreamFactory factory = new StreamFactory().withCollectionZkHost( collName,
> zkHost )
>.withFunctionName("gatherNodes",
> GatherNodesStream.class)
>.withFunctionName("search", CloudSolrStream.class)
>.withFunctionName("count", CountMetric.class)
>.withFunctionName("having", HavingStream.class)
>.withFunctionName("gt", GreaterThanOperation.class)
>.withFunctionName("eq", EqualsOperation.class);
>HavingStream cs = (HavingStream) factory.constructStream(
>  );
> 
> In this approach, I still have to build  streaming_expression_str in code.
> Is there any better approach for this or is there any java library to do
> this? My search for it didn't yield anything so I was wondering if anyone
> here has an idea.
> 
> Thanks,
> Pratik



Re: Total Collection Size in Solr 7

2018-06-27 Thread Aroop Ganguly
Ah ok ! 

> On Jun 27, 2018, at 8:53 AM, Erick Erickson  wrote:
> 
> Just sum up the sizes of all the files in your index directory. Clumsy
> to be sure
> 
> On Tue, Jun 26, 2018 at 3:12 PM, Aroop Ganguly  
> wrote:
>> Hi Eric
>> 
>> Thanks for the advice.
>> One open question still, about point 1 below: how to get that magic number 
>> of size in GBs :) ?
>> As I am mostly using streaming expressions, most of my fields are DocValues 
>> and not stored.
>> 
>> I will look at the health endpoint to see what it gives me in connection 
>> with size.
>> 
>> Thanks
>> Aroop
>> 
>> 
>>> On Jun 26, 2018, at 10:49 AM, Erick Erickson  
>>> wrote:
>>> 
>>> Aroop:
>>> 
>>> Not that I know of. You could do a reasonable approximation by
>>> 1> check the index size (manually) with, say, 10M docs
>>> 2> check it again with 20M docs
>>> 3> use a match all docs query and do the math.
>>> 
>>> That's clumsy but do-able. The reason I start with 10M and 20M is that
>>> index size does not go up linearly so I like to seed the index first.
>>> 
>>> That said, though, it's hard to generalize index size as meaning much.
>>> Is it 90% stored? 10% stored data? Those ratios have huge implications
>>> on whether you're straining anything except disk space.
>>> 
>>> There are a lot of metrics, starting with Solr 6.4 that are available
>>> that give you a much better view of Solr's health.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Tue, Jun 26, 2018 at 9:21 AM, Aroop Ganguly  
>>> wrote:
>>>> Hi Erick
>>>> 
>>>> Sure I will look those jiras up.
>>>> In the interim, is what Susmit suggested the only way to get the size 
>>>> info? Or is there something else you can recommend?
>>>> 
>>>> Thanks
>>>> Aroop
>>>> 
>>>> 
>>>> 
>>>>> On Jun 26, 2018, at 6:53 AM, Erick Erickson  
>>>>> wrote:
>>>>> 
>>>>> Some work is being done on the admin UI, there are several JIRAs.
>>>>> Perhaps you'd like to join that conversation? We need to have input,
>>>>> especially in terms of what kinds of information would be useful from
>>>>> a practitioner's standpoint.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly 
>>>>>>  wrote:
>>>>>> I see, Thanks Susmit.
>>>>>> I hoped there was something simpler, that could just be part of the 
>>>>>> collections view we now have in solr 7 admin ui. Or a at least a one 
>>>>>> stop api call.
>>>>>> I guess this will be added in a later release.
>>>>>> 
>>>>>>> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
>>>>>>> 
>>>>>>> Hi Aroop,
>>>>>>> i created a utility using solrzkclient api to read state.json, 
>>>>>>> enumerated (one) replica for each shard and used /replication handler 
>>>>>>> for size and added them up..
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Team
>>>>>>>> 
>>>>>>>> I am not sure how to ascertain the total size of a collection via the 
>>>>>>>> Solr UI on a Solr7+ installation.
>>>>>>>> The collection is shared and replicated heavily so its tedious to have 
>>>>>>>> to look at each core and figure out the size of the entire collection 
>>>>>>>> from this in an additive way.
>>>>>>>> 
>>>>>>>> Is there an api or ui section from where this info can be obtained ?
>>>>>>>> 
>>>>>>>> On the flip side, it would be great to have a consolidated view of the 
>>>>>>>> collection size in GBs along with the individual shard sizes. (Should 
>>>>>>>> this be a Jira :) ?)
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Aroop
>>>>>> 
>> 



Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Eric

Thanks for the advice. 
One open question still, about point 1 below: how to get that magic number of 
size in GBs :) ?
As I am mostly using streaming expressions, most of my fields are DocValues and 
not stored.

I will look at the health endpoint to see what it gives me in connection with 
size.

Thanks
Aroop


> On Jun 26, 2018, at 10:49 AM, Erick Erickson  wrote:
> 
> Aroop:
> 
> Not that I know of. You could do a reasonable approximation by
> 1> check the index size (manually) with, say, 10M docs
> 2> check it again with 20M docs
> 3> use a match all docs query and do the math.
> 
> That's clumsy but do-able. The reason I start with 10M and 20M is that
> index size does not go up linearly so I like to seed the index first.
> 
> That said, though, it's hard to generalize index size as meaning much.
> Is it 90% stored? 10% stored data? Those ratios have huge implications
> on whether you're straining anything except disk space.
> 
> There are a lot of metrics, starting with Solr 6.4 that are available
> that give you a much better view of Solr's health.
> 
> Best,
> Erick
> 
> On Tue, Jun 26, 2018 at 9:21 AM, Aroop Ganguly  
> wrote:
>> Hi Erick
>> 
>> Sure I will look those jiras up.
>> In the interim, is what Susmit suggested the only way to get the size info? 
>> Or is there something else you can recommend?
>> 
>> Thanks
>> Aroop
>> 
>> 
>> 
>>> On Jun 26, 2018, at 6:53 AM, Erick Erickson  wrote:
>>> 
>>> Some work is being done on the admin UI, there are several JIRAs.
>>> Perhaps you'd like to join that conversation? We need to have input,
>>> especially in terms of what kinds of information would be useful from
>>> a practitioner's standpoint.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  
>>>> wrote:
>>>> I see, Thanks Susmit.
>>>> I hoped there was something simpler, that could just be part of the 
>>>> collections view we now have in solr 7 admin ui. Or a at least a one stop 
>>>> api call.
>>>> I guess this will be added in a later release.
>>>> 
>>>>> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
>>>>> 
>>>>> Hi Aroop,
>>>>> i created a utility using solrzkclient api to read state.json, enumerated 
>>>>> (one) replica for each shard and used /replication handler for size and 
>>>>> added them up..
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Team
>>>>>> 
>>>>>> I am not sure how to ascertain the total size of a collection via the 
>>>>>> Solr UI on a Solr7+ installation.
>>>>>> The collection is shared and replicated heavily so its tedious to have 
>>>>>> to look at each core and figure out the size of the entire collection 
>>>>>> from this in an additive way.
>>>>>> 
>>>>>> Is there an api or ui section from where this info can be obtained ?
>>>>>> 
>>>>>> On the flip side, it would be great to have a consolidated view of the 
>>>>>> collection size in GBs along with the individual shard sizes. (Should 
>>>>>> this be a Jira :) ?)
>>>>>> 
>>>>>> Thanks
>>>>>> Aroop
>>>> 



Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Erick

Sure I will look those jiras up. 
In the interim, is what Susmit suggested the only way to get the size info? Or 
is there something else you can recommend? 

Thanks
Aroop



> On Jun 26, 2018, at 6:53 AM, Erick Erickson  wrote:
> 
> Some work is being done on the admin UI, there are several JIRAs.
> Perhaps you'd like to join that conversation? We need to have input,
> especially in terms of what kinds of information would be useful from
> a practitioner's standpoint.
> 
> Best,
> Erick
> 
>> On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  
>> wrote:
>> I see, Thanks Susmit.
>> I hoped there was something simpler, that could just be part of the 
>> collections view we now have in solr 7 admin ui. Or a at least a one stop 
>> api call.
>> I guess this will be added in a later release.
>> 
>>> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
>>> 
>>> Hi Aroop,
>>> i created a utility using solrzkclient api to read state.json, enumerated 
>>> (one) replica for each shard and used /replication handler for size and 
>>> added them up..
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  wrote:
>>>> 
>>>> Hi Team
>>>> 
>>>> I am not sure how to ascertain the total size of a collection via the Solr 
>>>> UI on a Solr7+ installation.
>>>> The collection is shared and replicated heavily so its tedious to have to 
>>>> look at each core and figure out the size of the entire collection from 
>>>> this in an additive way.
>>>> 
>>>> Is there an api or ui section from where this info can be obtained ?
>>>> 
>>>> On the flip side, it would be great to have a consolidated view of the 
>>>> collection size in GBs along with the individual shard sizes. (Should this 
>>>> be a Jira :) ?)
>>>> 
>>>> Thanks
>>>> Aroop
>> 


Re: Total Collection Size in Solr 7

2018-06-25 Thread Aroop Ganguly
I see, Thanks Susmit. 
I hoped there was something simpler, that could just be part of the collections 
view we now have in solr 7 admin ui. Or a at least a one stop api call.
I guess this will be added in a later release.

> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
> 
> Hi Aroop, 
> i created a utility using solrzkclient api to read state.json, enumerated 
> (one) replica for each shard and used /replication handler for size and added 
> them up..
> 
> Sent from my iPhone
> 
>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  wrote:
>> 
>> Hi Team
>> 
>> I am not sure how to ascertain the total size of a collection via the Solr 
>> UI on a Solr7+ installation.
>> The collection is shared and replicated heavily so its tedious to have to 
>> look at each core and figure out the size of the entire collection from this 
>> in an additive way.
>> 
>> Is there an api or ui section from where this info can be obtained ?
>> 
>> On the flip side, it would be great to have a consolidated view of the 
>> collection size in GBs along with the individual shard sizes. (Should this 
>> be a Jira :) ?) 
>> 
>> Thanks
>> Aroop



Re: Indexing Approach

2018-06-25 Thread Aroop Ganguly
Would you mind sharing details on
1. the Solr Cloud setup, how may nodes do you have at your disposal and how 
many shards do you have setup ?
2. The indexing technology, what are you using? Core java/.net threads ? Or a 
system like spark ?
3. Where do you see the exceptions? The indexer process logs or Solr cloud logs?


> On Jun 25, 2018, at 11:06 PM, solrnoobie  wrote:
> 
> We are currently having problems in out current production setup in solr.
> 
> What we currently have is something like this:
> 
> - Solr 6.6.3 (cloud mode)
> - 10 threads for indexing
> - 900k total documents
> - 500 documents per batch
> 
> 
> So in each thread, the process will call a stored procedure with a lot of
> resultsets (1 main table and 8 sub tables) and after the db call, the
> application will assemble the documents based on the resultsets and then it
> will send it to solr for indexing.
> 
> We are having errors such as heap space error in our indexing so we decided
> to lower the batch size to 50. The problem with this is that sometimes it
> really does not help since 1 document can contain 1000 child documents and
> it will still have the heap errors and indexing is generally slow everytime.
> 
> So my question would be what approach should we have to resolve this kind of
> problem (will queue based indexing help? what are your indexing methods in
> your respective production environments?)?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Total Collection Size in Solr 7

2018-06-25 Thread Aroop Ganguly
Hi Team

I am not sure how to ascertain the total size of a collection via the Solr UI 
on a Solr7+ installation.
The collection is shared and replicated heavily so its tedious to have to look 
at each core and figure out the size of the entire collection from this in an 
additive way.

Is there an api or ui section from where this info can be obtained ?

On the flip side, it would be great to have a consolidated view of the 
collection size in GBs along with the individual shard sizes. (Should this be a 
Jira :) ?) 

Thanks
Aroop

Re: Search streaming expressions returns rows times number of shards docs

2018-06-21 Thread Aroop Ganguly
So I think 2 things are being missed here. You should be specifying the 
qt=“/export” to see all the results.
If you do not do that, then the select handler is used by default which gives 
the default 10-20 rows as result.

> On Jun 21, 2018, at 12:53 PM, Joel Bernstein  wrote:
> 
> That is actually the current behavior of the search expression. The initial
> use cases from Streaming Expressions revolved around joins and rollups
> which really require the entire result set. So the search expression just
> merged the results from the shards and let the wrapping expression deal
> with the results. Things have evolved quite a bit since then and having the
> search expression respect the rows parameter is something that I've been
> meaning to add. Feel free to create a ticket for this.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Thu, Jun 21, 2018 at 1:35 PM, Alfonso Muñoz-Pomer Fuentes <
> amu...@ebi.ac.uk> wrote:
> 
>> I’m having a weird issue with the search streaming expressions and I’d
>> like to share it before opening a ticket in Jira, just in case I’m missing
>> something obvious.
>> 
>> I’m currently on Solr 7.1 and I have a collection named bioentities split
>> into two shards and no replicas. Whenever I run a query such as this:
>> search(
>>  bioentities,
>>  q="*:*",
>>  fl="bioentity_identifier,property_value,property_name",
>>  sort="bioentity_identifier asc")
>> 
>> I’m getting 20 documents. If I add e.g. rows=4 I get 8 results, and so on.
>> 
>> I have the same collection in another SolrCloud cluster, split into three
>> shards and running the same queries I get 30 and 12 results, respectively.
>> So it seems that the seach expression distributes the query between shards
>> and then aggregates the results. Is this the expected behaviour?
>> 
>> Thanks in advance.
>> 
>> --
>> Alfonso Muñoz-Pomer Fuentes
>> Senior Lead Software Engineer @ Expression Atlas Team
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Tel:+ 44 (0) 1223 49 2633
>> Skype: amunozpomer
>> 
>> 



Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
I see. 
By definition of splitting, the new shards will have the same number of 
replicas as the original shard.
You could use the replicationFactor>=2 to ensure that both of your solr nodes 
are used.
You could also use the maxShardsPerNode parameter alone or in conjunction with 
the replicationFactor property to achieve your target state.



> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar 
>  wrote:
> 
> Thank you Aroop
> 
> After I import the data into the collection from the standalone solr core I
> want to split it into 2 shards across 2 nodes that I have. So I will have
> to set replicationfactor of 2 & numShards =2 ?
> 
> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly 
> wrote:
> 
>> Hi Sushant
>> 
>> replicationFactor defaults to 1 and is not mandatory.
>> numShards is mandatory, where you’d equate it to 1.
>> 
>> Aroop
>> 
>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>> svengurle...@curvolabs.com> wrote:
>>> 
>>> Thank you Eric.
>>> 
>>> In the create collection command I need to set the replication factor
>>> though correct?
>>> 
>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson >> 
>>> wrote:
>>> 
>>>> Probably the easiest way would be to recreate your collection with 1
>>>> shard. Then copy the index from your standalone setup.
>>>> 
>>>> After verifying your setup, use the Collections SPLITSHARD command.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>>>  wrote:
>>>>> I created a solr cloud collection with 2 shards and a replication
>> factor
>>>> of
>>>>> 2. How can I load data into this collection which I have currently
>> stored
>>>>> in a core on a standalone solr. I used the conf from this core on
>>>>> standalone solr to create the collection on the solrcloud
>>>>> 
>>>>> Thank you
>>>> 
>> 
>> 



Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
Hi Sushant

replicationFactor defaults to 1 and is not mandatory.
numShards is mandatory, where you’d equate it to 1.

Aroop

> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar 
>  wrote:
> 
> Thank you Eric.
> 
> In the create collection command I need to set the replication factor
> though correct?
> 
> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson 
> wrote:
> 
>> Probably the easiest way would be to recreate your collection with 1
>> shard. Then copy the index from your standalone setup.
>> 
>> After verifying your setup, use the Collections SPLITSHARD command.
>> 
>> Best,
>> Erick
>> 
>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>  wrote:
>>> I created a solr cloud collection with 2 shards and a replication factor
>> of
>>> 2. How can I load data into this collection which I have currently stored
>>> in a core on a standalone solr. I used the conf from this core on
>>> standalone solr to create the collection on the solrcloud
>>> 
>>> Thank you
>> 



Re: Solr Odbc for Parallel Sql integration with Tableau

2018-06-18 Thread Aroop Ganguly
Hi Joel

Yes I was able to make the ODBC bridge work very easily (using steps mentioned 
here https://github.com/risdenk/solrj-jdbc-testing/blob/master/odbc/README.md 
<https://github.com/risdenk/solrj-jdbc-testing/blob/master/odbc/README.md> ), 

But the actual Tableau integration has not been fruitful yet due to 2 reasons:

1. Tableau inherently writes inner queries : select a as A, b as B from (select 
* from c)
 — this fails immediately, as parallel sql in my experience does not like 
inner queries.

2. The default Tableau view which is awesome to drag and drop the entire table 
does not work for parallel sql as we need to specify a “limit” otherwise it 
keeps giving the error about “score”. So I defaulted to the custom query option 
on Tableau but it failed for the inherent inner-queryness of Tableau as 
mentioned in 1. :) 
 — I will keep at it tomorrow and maybe I will be able to figure a way out.


> On Jun 18, 2018, at 7:55 PM, Joel Bernstein  wrote:
> 
> That's interesting that you were able to setup OpenLink. At Alfresco we've
> done quite a bit of work on the Solr's JDBC driver to integrate it with the
> Alfresco repository, which uses Solr. But we haven't yet tackled the ODBC
> setup. That will come very soon. To really take advantage of Tableau's
> capabilities we will need to add joins to Solr's parallel SQL. Solr already
> uses Apache Calcite, which has a join optimizer, so mainly this would
> involve hooking up the various Streaming Expression joins.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Mon, Jun 18, 2018 at 6:37 PM, Aroop Ganguly 
> wrote:
> 
>> Ok I was able to setup the odic bridge (using OpenLink) and I see the
>> collections popping up in Tableau too.
>> But I am unable to actually get data flowing into Tableau reports because,
>> Tableau keeps creating inner queries and Solr seems to hate inner queries.
>> Is there a way to do inner queries in Solr Parallel Sql ?
>> 
>>> On Jun 18, 2018, at 12:30 PM, Aroop Ganguly 
>> wrote:
>>> 
>>> 
>>> Hi Everyone
>>> 
>>> I am not sure if something has been done on this yet, though I did see a
>> JIRA with links to the parallel sql documentation, but I do not think that
>> answers the question.
>>> 
>>> I love the jdbc driver and it works well for many UIs but there are
>> other systems that need an ODBC driver.
>>> 
>>> Can anyone share any guidance as to how this can be done or has been
>> done by others.
>>> 
>>> Thanks
>>> Aroop
>> 
>> 



Re: Solr Odbc for Parallel Sql integration with Tableau

2018-06-18 Thread Aroop Ganguly
Ok I was able to setup the odic bridge (using OpenLink) and I see the 
collections popping up in Tableau too.
But I am unable to actually get data flowing into Tableau reports because, 
Tableau keeps creating inner queries and Solr seems to hate inner queries.
Is there a way to do inner queries in Solr Parallel Sql ?

> On Jun 18, 2018, at 12:30 PM, Aroop Ganguly  wrote:
> 
> 
> Hi Everyone
> 
> I am not sure if something has been done on this yet, though I did see a JIRA 
> with links to the parallel sql documentation, but I do not think that answers 
> the question.
> 
> I love the jdbc driver and it works well for many UIs but there are other 
> systems that need an ODBC driver.
> 
> Can anyone share any guidance as to how this can be done or has been done by 
> others.
> 
> Thanks
> Aroop



Solr Odbc for Parallel Sql integration with Tableau

2018-06-18 Thread Aroop Ganguly


Hi Everyone

I am not sure if something has been done on this yet, though I did see a JIRA 
with links to the parallel sql documentation, but I do not think that answers 
the question.

I love the jdbc driver and it works well for many UIs but there are other 
systems that need an ODBC driver.

Can anyone share any guidance as to how this can be done or has been done by 
others.

Thanks
Aroop