Solr 6.5 autosuggest suggests misspelt words and unwanted words

2018-06-19 Thread Sri Sirisha Vallabhaneni
Hi ,

My Data contains un-curated data - which consists of *cuss words, misspelt
words* like *nd* instead of *need. *We are using a
auto-suggest/auto-complete that heavily relies on indexed data to recommend
suggestions as the user types in his query. We are using a list of stop
words consisting of cuss words to keep check on what is recommended to the
user and this list might get huge with time as well. Is there any clean way
to get around the problem

1. of eliminating cuss words entirely in suggestions
2. not suggesting misspelt words at all.

Thanks and Regards,
Sri


How to split index more than 2GB in size

2018-06-19 Thread Sushant Vengurlekar
How do I split indexes which are more than 2GB in size.

I get this error when I try to use SPLITSHARD on a collection of size more
than 2GB

2018-06-20 02:25:49.810 ERROR (qtp1025799482-19) [   ] o.a.s.s.HttpSolrCall
null:org.apache.solr.common.SolrException: SPLITSHARD failed to invoke
SPLIT core admin command

solr-1_1  | at
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)

solr-1_1  | at
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:258)

solr-1_1  | at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)

solr-1_1  | at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)

solr-1_1  | at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)

solr-1_1  | at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)

solr-1_1  | at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)

solr-1_1  | at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)

solr-1_1  | at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)

solr-1_1  | at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)

solr-1_1  | at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

solr-1_1  | at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)

solr-1_1  | at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)

solr-1_1  | at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)

solr-1_1  | at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)

solr-1_1  | at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)

solr-1_1  | at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)

solr-1_1  | at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)

solr-1_1  | at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

solr-1_1  | at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)

solr-1_1  | at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)

solr-1_1  | at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

solr-1_1  | at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)

solr-1_1  | at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)

solr-1_1  | at org.eclipse.jetty.server.Server.handle(Server.java:530)

solr-1_1  | at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)

solr-1_1  | at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)

solr-1_1  | at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)

solr-1_1  | at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)

solr-1_1  | at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)

solr-1_1  | at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)

solr-1_1  | at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)

solr-1_1  | at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)

solr-1_1  | at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)

solr-1_1  | at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)

solr-1_1  | at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)

solr-1_1  | at java.lang.Thread.run(Thread.java:748)

Thank you


Re: Solrcloud doesn't like relative path

2018-06-19 Thread Sushant Vengurlekar
Hi Eric

Based on your suggestion I moved the helpers to be under configsets/conf so
my new folder structure looks
-configsets
- conf
  helpers
  synonyms_vendors.txt
- collection1
-conf
schema.xml
solrconfig.xml

I am still getting this error

Caused by: Can't find resource 'helpers/synonyms_vendors.txt' in classpath
or '/configs/collection1', cwd=/opt/solr/server

Thank you
Sushant


On Tue, Jun 19, 2018 at 11:12 AM, Erick Erickson 
wrote:

> Configsets are presumed to contain any auxiliary files under them, not
> a relative path _on Zookeeper_.
>
> So try putting your synonyms_vendors.txt in
> configsets/conf/helpers/synonyms_vendors.txt, then
> reference it as helpers/synonyms_vendors.txt.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 10:28 AM, Sushant Vengurlekar
>  wrote:
> > I have this line in my schema.xml
> > synonyms="../../helpers/synonyms_vendors.txt"
> >
> > My current folder structure is
> > solr
> >- helpers
> >   synonyms_vendors.txt
> >-configsets
> > - collection1
> > -conf
> > schema.xml
> > solrconfig.xml
> >
> > I get the below error when I try to use the bin/solr create_collection
> > command
> >
> > Unable to create core [total-joints_4_shard1_replica1] Caused by: Can't
> > find resource 'helpers/synonyms_vendors.txt' in classpath or
> > '/configs/total-joints_4', cwd=/opt/solr/server
> >
> >
> > Can some one suggest what I can do to resolve this.
> >
> > Thank you
>


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
I might have been wrong there. Having an explicit check for the # results 
returned vs rows requested, would allow you to avoid the last request that 
would otherwise come back with 0 results. That check isn’t automatically done 
within Solr.

 Anshum


> On Jun 19, 2018, at 2:39 PM, Anshum Gupta  wrote:
> 
> Hi David,
> 
> The cursormark would be the same if you get back fewer than the max records 
> requested and so you should exit, as per the documentation.
> 
> I think the documentation says just what you are suggesting, but if you think 
> it could be improved, feel free to put up a patch.
> 
> 
>  Anshum
> 
> 
>> On Jun 18, 2018, at 2:09 AM, David Frese > > wrote:
>> 
>> Hi List,
>> 
>> the documentation of 'cursorMarks' recommends to fetch until a query returns 
>> the cursorMark that was passed in to a request.
>> 
>> But that always requires an additional request at the end, so I wonder if I 
>> can stop already, if a request returns less results than requested (num 
>> rows). There won't be new documents added during the search in my use case, 
>> so could there every be a non-empty 'page' after a non-full 'page'?
>> 
>> Thanks very much.
>> 
>> --
>> David Frese
>> +49 7071 70896 75
>> 
>> Active Group GmbH
>> Hechinger Str. 12/1, 72072 Tübingen
>> Registergericht: Amtsgericht Stuttgart, HRB 224404
>> Geschäftsführer: Dr. Michael Sperber
> 



signature.asc
Description: Message signed with OpenPGP


Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Shawn Heisey
On 6/19/2018 11:50 AM, Sushant Vengurlekar wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud

Erick's suggestion of creating a collection with one shard and one
replica, then splitting the shard and adding replicas is one solution. 
If properly executed, it can work very well.

Another possibility is to create the collection with the number of
shards and replicas that you want right up front and then use the
dataimport handler to import documents from the standalone Solr.  One of
the sources you can use with DIH is another Solr install.

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor

If you're using a new enough version of SolrCloud (6.4 or later), you
should definitely be using cursorMark in the DIH config and a sort
parameter that includes a sort on the uniqueKey field.

Thanks,
Shawn



Re: Extracting top level URL when indexing document

2018-06-19 Thread Gus Heck
I don't understand the inclusion of 'n' in the character classes in this
pattern... it's pretty clear that the broken examples in the OP were where
the letter n occurred in the domain name. I expect a similar problem for
user parts that contain n...

^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+)

On Tue, Jun 12, 2018 at 7:15 PM, Kevin Risden  wrote:

> Looks like stop words (in, and, on) is what is breaking. The regex looks
> like it is correct.
>
> Kevin Risden
>
> On Tue, Jun 12, 2018, 18:02 Hanjan, Harinder 
> wrote:
>
> > Hello!
> >
> > I am indexing web documents and have a need to extract their top-level
> URL
> > to be stored in a different field. I have had some success with the
> > PatternTokenizerFactory (relevant schema bits at the bottom) but the
> > behavior appears to be inconsistent.  Most of the times, the top level
> URL
> > is extracted just fine but for some documents, it is being cut off.
> >
> > Examples:
> > URL
> >
> > Extracted URL
> >
> > Comment
> >
> > http://www.calgaryarb.ca/eCourtPublic/15M2018.pdf
> >
> > http://www.calgaryarb.ca
> >
> > Success
> >
> > http://www.calgarymlc.ca/about-cmlc/
> >
> > http://www.calgarymlc.ca
> >
> > Success
> >
> > http://www.calgarypolicecommission.ca/reports.php
> >
> > http://www.calgarypolicecommissio
> >
> > Fail
> >
> > https://attainyourhome.com/
> >
> > https://attai
> >
> > Fail
> >
> > https://liveandplay.calgary.ca/DROPIN/page/dropin
> >
> > https://livea
> >
> > Fail
> >
> >
> >
> >
> > Relevant schema:
> > 
> >
> >  > multiValued="false"/>
> >
> >  > sortMissingLast="true">
> > 
> >  >
> > class="solr.PatternTokenizerFactory"
> >
> > pattern="^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+)"
> > group="0"/>
> > 
> > 
> >
> >
> > I have tested the Regex and it is matching things fine. Please see
> > https://regex101.com/r/wN6cZ7/358.
> > So it appears that I have a gap in my understanding of how Solr
> > PatternTokenizerFactory works. I would appreciate any insight on the
> issue.
> > hostname field will be used in facet queries.
> >
> > Thank you!
> > Harinder
> >
> > 
> > NOTICE -
> > This communication is intended ONLY for the use of the person or entity
> > named above and may contain information that is confidential or legally
> > privileged. If you are not the intended recipient named above or a person
> > responsible for delivering messages or communications to the intended
> > recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
> > of this communication or any of the information contained in it is
> strictly
> > prohibited. If you have received this communication in error, please
> notify
> > us immediately by telephone and then destroy or delete this
> communication,
> > or return it to us by mail if requested by us. The City of Calgary thanks
> > you for your attention and co-operation.
> >
>



-- 
http://www.the111shift.com


Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Erick Erickson
Personally I'd start with a 1-shard, 1-replica collection (i.e. leader-only).

>From there split the shard.

once all that had been done satisfactorily, just use the collections
API ADDREPLICA command to build out your collection to whatever degree
of redundancy you need.

Best,
Erick

On Tue, Jun 19, 2018 at 1:04 PM, Aroop Ganguly  wrote:
> I see.
> By definition of splitting, the new shards will have the same number of 
> replicas as the original shard.
> You could use the replicationFactor>=2 to ensure that both of your solr nodes 
> are used.
> You could also use the maxShardsPerNode parameter alone or in conjunction 
> with the replicationFactor property to achieve your target state.
>
>
>
>> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar 
>>  wrote:
>>
>> Thank you Aroop
>>
>> After I import the data into the collection from the standalone solr core I
>> want to split it into 2 shards across 2 nodes that I have. So I will have
>> to set replicationfactor of 2 & numShards =2 ?
>>
>> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly 
>> wrote:
>>
>>> Hi Sushant
>>>
>>> replicationFactor defaults to 1 and is not mandatory.
>>> numShards is mandatory, where you’d equate it to 1.
>>>
>>> Aroop
>>>
 On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>>> svengurle...@curvolabs.com> wrote:

 Thank you Eric.

 In the create collection command I need to set the replication factor
 though correct?

 On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson >>>
 wrote:

> Probably the easiest way would be to recreate your collection with 1
> shard. Then copy the index from your standalone setup.
>
> After verifying your setup, use the Collections SPLITSHARD command.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>  wrote:
>> I created a solr cloud collection with 2 shards and a replication
>>> factor
> of
>> 2. How can I load data into this collection which I have currently
>>> stored
>> in a core on a standalone solr. I used the conf from this core on
>> standalone solr to create the collection on the solrcloud
>>
>> Thank you
>
>>>
>>>
>


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
Hi David,

The cursormark would be the same if you get back fewer than the max records 
requested and so you should exit, as per the documentation.

I think the documentation says just what you are suggesting, but if you think 
it could be improved, feel free to put up a patch.


 Anshum


> On Jun 18, 2018, at 2:09 AM, David Frese  wrote:
> 
> Hi List,
> 
> the documentation of 'cursorMarks' recommends to fetch until a query returns 
> the cursorMark that was passed in to a request.
> 
> But that always requires an additional request at the end, so I wonder if I 
> can stop already, if a request returns less results than requested (num 
> rows). There won't be new documents added during the search in my use case, 
> so could there every be a non-empty 'page' after a non-full 'page'?
> 
> Thanks very much.
> 
> --
> David Frese
> +49 7071 70896 75
> 
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber



signature.asc
Description: Message signed with OpenPGP


Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
That explains it :)

I assume you did make those changes on disk and did not upload the updated 
configset to zookeeper.

SolrCloud instances use the configset from zk, so all changed files would have 
to be uploaded to zk.

You can re-uplaod the configset using the zkcli.sh script that comes with Solr 
(or some other utility) : 
https://lucene.apache.org/solr/guide/7_3/command-line-utilities.html#using-solr-s-zookeeper-cli
 


You can also use this script: 
https://lucene.apache.org/solr/guide/7_3/using-zookeeper-to-manage-configuration-files.html#uploading-configuration-files-using-bin-solr-or-solrj
 


Here’s the config set API that can also be used to accomplish the same: 
https://lucene.apache.org/solr/guide/7_3/configsets-api.html#configsets-api-entry-points
 


Whatever mechanism you choose to upload the updated config, you should be able 
to see the latest config @ the Solr admin UI (assuming you have access to that) 
by cloud > tree > configs > 


 Anshum


> On Jun 19, 2018, at 2:08 PM, Monique Monteiro  
> wrote:
> 
> I reloaded the collection with the command:
> 
> http://localhost:8983/solr/admin/collections?action=RELOAD=documentos_ce
> 
> But stil the same problem...
> 
> On Tue, Jun 19, 2018 at 4:48 PM Monique Monteiro 
> wrote:
> 
>> Hi Anshum,
>> 
>> I'm using SolrCloud, but both instances are on the same Solr installation
>> (it's just for test purposes), so I suppose they share configuration in
>> solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml.
>> 
>> So should I recreate the collection ?
>> 
>> Thanks,
>> Monique
>> 
>> On Tue, Jun 19, 2018 at 4:41 PM Anshum Gupta  wrote:
>> 
>>> Hi Monique,
>>> 
>>> Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to
>>> make sure that you uploaded the right config and collection should also be
>>> reloaded if you enabled it after creating the collection.
>>> 
>>> Also, did you check the MLT Query parser that does the same thing but
>>> doesn’t require registering of the handler etc. You can find it’s
>>> documentation here:
>>> https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
>>> 
>>> * *Anshum
>>> 
>>> 
>>> On Jun 19, 2018, at 11:00 AM, Monique Monteiro 
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
>>> 
>>> I've already configured the following:
>>> 
>>> 
>>>  - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
>>> 
>>> *>> path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
>>> **
>>> *  _text_*
>>> **
>>> *  *
>>> 
>>> AND
>>> 
>>> **
>>> **
>>> *list *
>>> * *
>>> *  *
>>> 
>>> But none of this made "http://localhost:8983/solr/*>> name>*/mlt?q=*:*
>>> return anything other than 404.
>>> 
>>> Has anyone any idea about what may be happening?
>>> 
>>> Thanks in advance,
>>> 
>>> --
>>> Monique Monteiro
>>> 
>>> 
>>> 
>> 
>> --
>> Monique Monteiro
>> Blog: http://moniquelouise.spaces.live.com/
>> Twitter: http://twitter.com/monilouise
>> 
> 
> 
> --
> Monique Monteiro
> Blog: http://moniquelouise.spaces.live.com/
> Twitter: http://twitter.com/monilouise



signature.asc
Description: Message signed with OpenPGP


Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
I reloaded the collection with the command:

http://localhost:8983/solr/admin/collections?action=RELOAD=documentos_ce

But stil the same problem...

On Tue, Jun 19, 2018 at 4:48 PM Monique Monteiro 
wrote:

> Hi Anshum,
>
> I'm using SolrCloud, but both instances are on the same Solr installation
> (it's just for test purposes), so I suppose they share configuration in
> solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml.
>
> So should I recreate the collection ?
>
> Thanks,
> Monique
>
> On Tue, Jun 19, 2018 at 4:41 PM Anshum Gupta  wrote:
>
>> Hi Monique,
>>
>> Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to
>> make sure that you uploaded the right config and collection should also be
>> reloaded if you enabled it after creating the collection.
>>
>> Also, did you check the MLT Query parser that does the same thing but
>> doesn’t require registering of the handler etc. You can find it’s
>> documentation here:
>> https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
>>
>> * *Anshum
>>
>>
>> On Jun 19, 2018, at 11:00 AM, Monique Monteiro 
>> wrote:
>>
>> Hi all,
>>
>> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
>>
>> I've already configured the following:
>>
>>
>>   - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
>>
>>  *> path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
>> **
>> *  _text_*
>> **
>> *  *
>>
>> AND
>>
>>  **
>> **
>> *list *
>> * *
>> *  *
>>
>> But none of this made "http://localhost:8983/solr/*> name>*/mlt?q=*:*
>> return anything other than 404.
>>
>> Has anyone any idea about what may be happening?
>>
>> Thanks in advance,
>>
>> --
>> Monique Monteiro
>>
>>
>>
>
> --
> Monique Monteiro
> Blog: http://moniquelouise.spaces.live.com/
> Twitter: http://twitter.com/monilouise
>


-- 
Monique Monteiro
Blog: http://moniquelouise.spaces.live.com/
Twitter: http://twitter.com/monilouise


Re: Is anybody using UIMA with Solr?

2018-06-19 Thread Nicolas Paris
sorry thought I was on UIMA mailing list.
That being said, my position is the same :

let UIMA folks load data into SolR by using the most optimized way.
(what would be the best way ? Loading jsons ?)

2018-06-19 22:48 GMT+02:00 Nicolas Paris :

> Hi
>
> Not realy a direct answer - Never used it, however this feature have
> been attractive to me while first looking at uima.
>
> Right now, I would say UIMA connectors in general are by design
> a pain to maintain. Source and target often do have optimised
> way to bulk export/import data. For example, using a jdbc postgresql
> connector is a bad idea compared to using the optimzed COPY function.
> And each database has it's own optimized way of doing.
>
> That's why developpers of UIMA should focus on  improving what UIMA
> is good at: processing texts.
> Exporting and importing texts responsibility should remain to the other
> tools.
>
> Tell me if i am wrong
>
> 2018-06-18 13:13 GMT+02:00 Alexandre Rafalovitch :
>
>> Hi,
>>
>> Solr ships an UIMA component and examples that haven't worked for a
>> while. Details are in:
>> https://issues.apache.org/jira/browse/SOLR-11694
>>
>> The choices for developers are:
>> 1) Rip UIMA out (and save space)
>> 2) Update UIMA to latest 2.x version
>> 3) Update UIMA to super-latest possibly-breaking 3.x
>>
>> The most likely choice at this point is 1. But I am curious (given
>> that UIMA is in IBM Watson...) if anybody actually has a use-case that
>> strongly votes for options 2 or 3, given that the update effort is
>> probably not trivial.
>>
>> Note that if you use UIMA with Solr, but in a configuration completely
>> different from that shipped (so the options 2/3 would still be
>> irrelevant), it could be still fun to share the knowledge in this
>> thread, with the appropriate disclaimer.
>>
>> Regards,
>>Alex.
>>
>
>


Re: Is anybody using UIMA with Solr?

2018-06-19 Thread Nicolas Paris
Hi

Not realy a direct answer - Never used it, however this feature have
been attractive to me while first looking at uima.

Right now, I would say UIMA connectors in general are by design
a pain to maintain. Source and target often do have optimised
way to bulk export/import data. For example, using a jdbc postgresql
connector is a bad idea compared to using the optimzed COPY function.
And each database has it's own optimized way of doing.

That's why developpers of UIMA should focus on  improving what UIMA
is good at: processing texts.
Exporting and importing texts responsibility should remain to the other
tools.

Tell me if i am wrong

2018-06-18 13:13 GMT+02:00 Alexandre Rafalovitch :

> Hi,
>
> Solr ships an UIMA component and examples that haven't worked for a
> while. Details are in:
> https://issues.apache.org/jira/browse/SOLR-11694
>
> The choices for developers are:
> 1) Rip UIMA out (and save space)
> 2) Update UIMA to latest 2.x version
> 3) Update UIMA to super-latest possibly-breaking 3.x
>
> The most likely choice at this point is 1. But I am curious (given
> that UIMA is in IBM Watson...) if anybody actually has a use-case that
> strongly votes for options 2 or 3, given that the update effort is
> probably not trivial.
>
> Note that if you use UIMA with Solr, but in a configuration completely
> different from that shipped (so the options 2/3 would still be
> irrelevant), it could be still fun to share the knowledge in this
> thread, with the appropriate disclaimer.
>
> Regards,
>Alex.
>


Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
I see. 
By definition of splitting, the new shards will have the same number of 
replicas as the original shard.
You could use the replicationFactor>=2 to ensure that both of your solr nodes 
are used.
You could also use the maxShardsPerNode parameter alone or in conjunction with 
the replicationFactor property to achieve your target state.



> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar 
>  wrote:
> 
> Thank you Aroop
> 
> After I import the data into the collection from the standalone solr core I
> want to split it into 2 shards across 2 nodes that I have. So I will have
> to set replicationfactor of 2 & numShards =2 ?
> 
> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly 
> wrote:
> 
>> Hi Sushant
>> 
>> replicationFactor defaults to 1 and is not mandatory.
>> numShards is mandatory, where you’d equate it to 1.
>> 
>> Aroop
>> 
>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>> svengurle...@curvolabs.com> wrote:
>>> 
>>> Thank you Eric.
>>> 
>>> In the create collection command I need to set the replication factor
>>> though correct?
>>> 
>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson >> 
>>> wrote:
>>> 
 Probably the easiest way would be to recreate your collection with 1
 shard. Then copy the index from your standalone setup.
 
 After verifying your setup, use the Collections SPLITSHARD command.
 
 Best,
 Erick
 
 On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
  wrote:
> I created a solr cloud collection with 2 shards and a replication
>> factor
 of
> 2. How can I load data into this collection which I have currently
>> stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud
> 
> Thank you
 
>> 
>> 



Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
Thank you Aroop

After I import the data into the collection from the standalone solr core I
want to split it into 2 shards across 2 nodes that I have. So I will have
to set replicationfactor of 2 & numShards =2 ?

On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly 
wrote:

> Hi Sushant
>
> replicationFactor defaults to 1 and is not mandatory.
> numShards is mandatory, where you’d equate it to 1.
>
> Aroop
>
> > On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
> svengurle...@curvolabs.com> wrote:
> >
> > Thank you Eric.
> >
> > In the create collection command I need to set the replication factor
> > though correct?
> >
> > On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson  >
> > wrote:
> >
> >> Probably the easiest way would be to recreate your collection with 1
> >> shard. Then copy the index from your standalone setup.
> >>
> >> After verifying your setup, use the Collections SPLITSHARD command.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
> >>  wrote:
> >>> I created a solr cloud collection with 2 shards and a replication
> factor
> >> of
> >>> 2. How can I load data into this collection which I have currently
> stored
> >>> in a core on a standalone solr. I used the conf from this core on
> >>> standalone solr to create the collection on the solrcloud
> >>>
> >>> Thank you
> >>
>
>


Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
Hi Anshum,

I'm using SolrCloud, but both instances are on the same Solr installation
(it's just for test purposes), so I suppose they share configuration in
solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml.

So should I recreate the collection ?

Thanks,
Monique

On Tue, Jun 19, 2018 at 4:41 PM Anshum Gupta  wrote:

> Hi Monique,
>
> Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to
> make sure that you uploaded the right config and collection should also be
> reloaded if you enabled it after creating the collection.
>
> Also, did you check the MLT Query parser that does the same thing but
> doesn’t require registering of the handler etc. You can find it’s
> documentation here:
> https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
>
> * *Anshum
>
>
> On Jun 19, 2018, at 11:00 AM, Monique Monteiro 
> wrote:
>
> Hi all,
>
> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
>
> I've already configured the following:
>
>
>   - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
>
>  * path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
> **
> *  _text_*
> **
> *  *
>
> AND
>
>  **
> **
> *list *
> * *
> *  *
>
> But none of this made "http://localhost:8983/solr/**/mlt?q=*:*
> return anything other than 404.
>
> Has anyone any idea about what may be happening?
>
> Thanks in advance,
>
> --
> Monique Monteiro
>
>
>

-- 
Monique Monteiro
Blog: http://moniquelouise.spaces.live.com/
Twitter: http://twitter.com/monilouise


Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
Hi Sushant

replicationFactor defaults to 1 and is not mandatory.
numShards is mandatory, where you’d equate it to 1.

Aroop

> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar 
>  wrote:
> 
> Thank you Eric.
> 
> In the create collection command I need to set the replication factor
> though correct?
> 
> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson 
> wrote:
> 
>> Probably the easiest way would be to recreate your collection with 1
>> shard. Then copy the index from your standalone setup.
>> 
>> After verifying your setup, use the Collections SPLITSHARD command.
>> 
>> Best,
>> Erick
>> 
>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>  wrote:
>>> I created a solr cloud collection with 2 shards and a replication factor
>> of
>>> 2. How can I load data into this collection which I have currently stored
>>> in a core on a standalone solr. I used the conf from this core on
>>> standalone solr to create the collection on the solrcloud
>>> 
>>> Thank you
>> 



Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
Hi Monique,

Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to make 
sure that you uploaded the right config and collection should also be reloaded 
if you enabled it after creating the collection.

Also, did you check the MLT Query parser that does the same thing but doesn’t 
require registering of the handler etc. You can find it’s documentation here: 
https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
 


 Anshum


> On Jun 19, 2018, at 11:00 AM, Monique Monteiro  
> wrote:
> 
> Hi all,
> 
> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
> 
> I've already configured the following:
> 
> 
>   - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
> 
>  * path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
> **
> *  _text_*
> **
> *  *
> 
> AND
> 
>  **
> **
> *list *
> * *
> *  *
> 
> But none of this made "http://localhost:8983/solr/**/mlt?q=*:*
> return anything other than 404.
> 
> Has anyone any idea about what may be happening?
> 
> Thanks in advance,
> 
> --
> Monique Monteiro



signature.asc
Description: Message signed with OpenPGP


Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
Thank you Eric.

In the create collection command I need to set the replication factor
though correct?

On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson 
wrote:

> Probably the easiest way would be to recreate your collection with 1
> shard. Then copy the index from your standalone setup.
>
> After verifying your setup, use the Collections SPLITSHARD command.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>  wrote:
> > I created a solr cloud collection with 2 shards and a replication factor
> of
> > 2. How can I load data into this collection which I have currently stored
> > in a core on a standalone solr. I used the conf from this core on
> > standalone solr to create the collection on the solrcloud
> >
> > Thank you
>


Re: sharding and placement of replicas

2018-06-19 Thread Shawn Heisey
On 6/15/2018 11:08 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> If I start with a collection X on two nodes with one shard and two replicas 
> (for redundancy, in case a node goes down): a node on host1 has 
> X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try 
> SPLITSHARD, I generally get X_shard1_0_replica1, X_shard1_1_replica1 and 
> X_shard1_0_replica0 all on the node on host1 with X_shard1_1_replica0 sitting 
> alone on the node on host2. If host1 were to go down at this point, shard1_0 
> would be unavailable.

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

That documentation says "The new shards will have as many replicas as
the original shard."  That tells me that what you're seeing is not
matching the *intent* of the SPLITSHARD feature.  The fact that you get
*one* of the new shards but not the other is suspicious.  I'm wondering
if maybe Solr tried to create it but had a problem doing so.  Can you
check for errors in the solr logfile on host2?

If there's nothing about your environment that would cause a failure to
create the replica, then it might be a bug.

> Is there a way either of specifying placement or of giving hints that 
> replicas ought to be separated?

It shouldn't be necessary to give Solr any parameters for that.  All
nodes where the shard exists should get copies of the new shards when
you split it.

> I am currently running Solr6.6.0, if that is relevant.

If this is a provable and reproducible bug, and it's still a problem in
the current stable branch (next release from that will be 7.4.0), then
it will definitely be fixed.  If it's only a problem in 6.x, then I
can't guarantee that it will be fixed.  That's because the 6.x line is
in maintenance mode, which means that there's a very high bar for
changes.  In most cases, only changes that meet one of these criteria
are made in maintenance mode:

 * Fixes a security bug.
 * Fixes a MAJOR bug with no workaround.
 * Fix is a very trivial code change and not likely to introduce new bugs.

Of those criteria, generally only the first two are likely to prompt an
actual new software release.  If enough changes of the third type
accumulate, that might prompt a new release.

My personal opinion:  If this is a general problem in 6.x, it should be
fixed there.  Because there is a workaround, it would not be cause for
an immediate new release.

Thanks,
Shawn



Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Erick Erickson
Probably the easiest way would be to recreate your collection with 1
shard. Then copy the index from your standalone setup.

After verifying your setup, use the Collections SPLITSHARD command.

Best,
Erick

On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
 wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud
>
> Thank you


Re: Solrcloud doesn't like relative path

2018-06-19 Thread Erick Erickson
Configsets are presumed to contain any auxiliary files under them, not
a relative path _on Zookeeper_.

So try putting your synonyms_vendors.txt in
configsets/conf/helpers/synonyms_vendors.txt, then
reference it as helpers/synonyms_vendors.txt.

Best,
Erick

On Tue, Jun 19, 2018 at 10:28 AM, Sushant Vengurlekar
 wrote:
> I have this line in my schema.xml
> synonyms="../../helpers/synonyms_vendors.txt"
>
> My current folder structure is
> solr
>- helpers
>   synonyms_vendors.txt
>-configsets
> - collection1
> -conf
> schema.xml
> solrconfig.xml
>
> I get the below error when I try to use the bin/solr create_collection
> command
>
> Unable to create core [total-joints_4_shard1_replica1] Caused by: Can't
> find resource 'helpers/synonyms_vendors.txt' in classpath or
> '/configs/total-joints_4', cwd=/opt/solr/server
>
>
> Can some one suggest what I can do to resolve this.
>
> Thank you


MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
Hi all,

I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.

I've already configured the following:


   - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:

  **
**
*  _text_*
**
*  *

 AND

  **
**
*list *
* *
*  *

But none of this made "http://localhost:8983/solr/**/mlt?q=*:*
return anything other than 404.

Has anyone any idea about what may be happening?

Thanks in advance,

-- 
Monique Monteiro


RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Does anyone have any additional possible causes for this issue? I checked the 
buffer status using "/cdcr?action=STATUS" and it says buffer disabled at both 
target and source.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, June 19, 2018 11:55 AM
To: solr-user 
Subject: Re: tlogs not deleting

bq. Do you recommend disabling the buffer on the source SolrCloud as well?

Disable them all on both source and target IMO.

On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee  wrote:
> Thank you Erick. I am running Solr 6.6. From the documentation:
> "Replicas do not need to buffer updates, and it is recommended to disable 
> buffer on the target SolrCloud."
>
> Do you recommend disabling the buffer on the source SolrCloud as well? It 
> looks like I already have the buffer disabled at target locations but not the 
> source location. Would it even make sense at the source location?
>
> This is what I have at the target locations:
> 
>   
>   100
>   
>   
> disabled
>   
> 
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, June 19, 2018 11:00 AM
> To: solr-user 
> Subject: Re: tlogs not deleting
>
> Take a look at the CDCR section of your reference guide, be sure you get the 
> version which you can download from here:
> https://archive.apache.org/dist/lucene/solr/ref-guide/
>
> There's the CDCR API call you can use for in-flight disabling, and depending 
> on the version of Solr you can set it in solrconfig.
>
> Basically, buffering was there in the original CDCR to allow a larger 
> maintenance window, you could enable buffering and all updates were saved 
> until you disabled it, during which period you could do whatever you needed 
> with your target cluster and not lose any updates.
>
> Later versions can do the full sync of the index and buffering is being 
> removed.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
>> Thanks for the suggestion. Can you please elaborate a little bit about what 
>> DISABLEBUFFER does? The documentation is not very detailed. Is this 
>> something that needs to be done manually whenever this problem happens or is 
>> it something that we can do to fix it so it won't happen again?
>>
>> -Original Message-
>> From: Susheel Kumar [mailto:susheel2...@gmail.com]
>> Sent: Monday, June 18, 2018 9:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: tlogs not deleting
>>
>> You may have to DISABLEBUFFER in source to get rid of tlogs.
>>
>> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
>>
>>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I 
>>> understand, after a hard commit, solr is supposed to delete old 
>>> tlogs depending on the numRecordsToKeep and maxNumLogsToKeep values 
>>> in the autocommit settings in solrconfig.xml. I am occasionally 
>>> seeing solr fail to do this and the tlogs just build up over time 
>>> and eventually we run out of disk space on the VM and this causes problems 
>>> for us.
>>> This does not happen all the time, only sometimes. I currently have 
>>> a tlog directory that has 123G worth of tlogs. The last hard commit 
>>> on this node was 10 minutes ago but these tlogs date back to 3 days ago.
>>>
>>> We have sometimes found that restarting solr on the node will get it 
>>> to clean up the old tlogs, but we really want to find the root cause 
>>> and fix it if possible so we don't keep getting disk space alerts 
>>> and have to adhoc restart nodes. Has anyone seen an issue like this before?
>>>
>>> My update handler settings look like this:
>>>   
>>>
>>>   
>>>
>>>   ${solr.ulog.dir:}
>>>   ${solr.ulog.numVersionBuckets:
>>> 65536}
>>> 
>>> 
>>> 60
>>> 25
>>> false
>>> 
>>> 
>>> 12
>>> 
>>>
>>>   
>>> 100
>>>   
>>>
>>>   
>>>


Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
I created a solr cloud collection with 2 shards and a replication factor of
2. How can I load data into this collection which I have currently stored
in a core on a standalone solr. I used the conf from this core on
standalone solr to create the collection on the solrcloud

Thank you


Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Wei
Thanks Mikhail and Alessandro.

On Tue, Jun 19, 2018 at 2:37 AM, Mikhail Khludnev  wrote:

> you need to index num vals
>  apache/solr/update/processor/CountFieldValuesUpdateProcessorFactory.html>
> in the separate field, and then *:* -(V:(A AND B) AND numVals:2) -(V:(A OR
> B) AND numVals:1)
>
>
> On Tue, Jun 19, 2018 at 9:20 AM Wei  wrote:
>
> > Hi,
> >
> > I have a multi-value field,  and there is a limited set of values for the
> > field: A, B, C, D.
> > Is there a way to filter out documents that has only A or B values in the
> > multi-value field?
> >
> > Basically I want to  exclude document that has:
> >
> > A
> >
> > B
> >
> > A B
> >
> > and get documents that has:
> >
> >
> > C
> >
> > D
> >
> > C D
> >
> > A C
> >
> > B C
> >
> > A D
> >
> > B D
> >
> > A B C
> >
> > A B D
> >
> > A C D
> >
> > B C D
> >
> > A B C D
> >
> >
> > Thanks,
> >
> > Wei
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Limited Search Results During Full Reindexing - Fine Once Completed

2018-06-19 Thread Erick Erickson
I'd set your soft commit interval to as long as you can stand. Every
soft commit opens a new searcher and does significant work, including
throwing away your queryResultCache and filterCache.

The time here should be as long as you can afford to not be able to
search updates. Don't go totally overboard here, 5-10 minutes is
pretty reasonable as a maximum. 3 seconds means your caches will only
be valid for a short period and then will be rebuild as per your
autowarm settings.

Don't go overboard with your autowarm settings, I usually start with < 20.

Best,
Erick

On Tue, Jun 19, 2018 at 9:22 AM, THADC
 wrote:
> thanks I changed the autosoftCommit from -1 and 3000 and that seemed to do
> the trick.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solrcloud doesn't like relative path

2018-06-19 Thread Sushant Vengurlekar
I have this line in my schema.xml
synonyms="../../helpers/synonyms_vendors.txt"

My current folder structure is
solr
   - helpers
  synonyms_vendors.txt
   -configsets
- collection1
-conf
schema.xml
solrconfig.xml

I get the below error when I try to use the bin/solr create_collection
command

Unable to create core [total-joints_4_shard1_replica1] Caused by: Can't
find resource 'helpers/synonyms_vendors.txt' in classpath or
'/configs/total-joints_4', cwd=/opt/solr/server


Can some one suggest what I can do to resolve this.

Thank you


RE: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Davis, Daniel (NIH/NLM) [C]
Elastic allows the mappings to be set all at once, either in the template or as 
index settings.  That is an important feature because it allows the field 
definitions to be source code artifacts, which can be deployed very easily by 
an automatic script.

Solr's Managed Schema API allows multiple changes to be combined into a single 
POST, but the API changes are not declarative - they modify the current schema 
rather than setting it.  It would be better if there were an API in the managed 
schema API to declaratively set the schema field defs, fields, dynamic fields, 
and copy fields through a single API call.  This would replace the current 
function of schema.xml

Since that mechanism does not yet exist, I think it is too soon to eliminate 
schema.xml.

This function of setting it declaratively to exactly what you want is also met 
by using an uploaded configset, and since solrconfig.xml isn't going away, then 
this step is not eliminated, and so it seems that an additional step to 
reliable deployment is introduced.

That said, as long as there is a strong idea of the baseline schema, achieving 
the desired schema via add, remove, and replace operations is reasonable.

> -Original Message-
> From: Doug Turnbull [mailto:dturnb...@opensourceconnections.com]
> Sent: Tuesday, June 19, 2018 12:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Remove schema.xml in favor of managed-schema
> 
> I actually prefer the classic config-files approach over managed schemas.
> Having done both Elasticsearch (where everything is configed through an
> API), managed and non-managed Solr, I prefer the legacy non-managed Solr
> way of doing things when its possible
> 
> - With 'managed' approaches, the config code often turns into spaghetti
> throughout the client application, and harder to maintain
> - The client application is often done in any number of programming
> languages, client APIs, etc which makes it harder to ramp up new Solr devs
> on how the search engine works
> - The file-based config can be versioned and deployed as an artifact that
> only contains config bits relevant to the search engine
> 
> I know there's a lot of 'it depends'. For example, if I am programatically
> changing config in real-time without wanting to restart the search engine,
> then I can see the benefit to the managed config. Especially a large,
> complex deployment. But most Solr instances I see are not in the giant,
> complex to config variety and the config file approach is simplest for most
> teams.
> 
> At least that's my 2 cents :)
> -Doug
> 
> 
> On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch
> 
> wrote:
> 
> > And that managed-schema will reorder the entries and delete the
> comments on
> > first API modification.
> >
> > Regards,
> > Alex
> >
> > On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, 
> wrote:
> >
> > > On 6/17/2018 6:48 PM, S G wrote:
> > > > I only wanted to know if schema.xml offer anything that managed-
> schema
> > > does
> > > > not.
> > >
> > > The only difference between the two is that there is a different
> > > filename and the managed version can be modified by API calls.  The
> > > schema format and what you can do within that format is identical either
> > > way.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug


Re: Limited Search Results During Full Reindexing - Fine Once Completed

2018-06-19 Thread THADC
thanks I changed the autosoftCommit from -1 and 3000 and that seemed to do
the trick.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Doug Turnbull
I actually prefer the classic config-files approach over managed schemas.
Having done both Elasticsearch (where everything is configed through an
API), managed and non-managed Solr, I prefer the legacy non-managed Solr
way of doing things when its possible

- With 'managed' approaches, the config code often turns into spaghetti
throughout the client application, and harder to maintain
- The client application is often done in any number of programming
languages, client APIs, etc which makes it harder to ramp up new Solr devs
on how the search engine works
- The file-based config can be versioned and deployed as an artifact that
only contains config bits relevant to the search engine

I know there's a lot of 'it depends'. For example, if I am programatically
changing config in real-time without wanting to restart the search engine,
then I can see the benefit to the managed config. Especially a large,
complex deployment. But most Solr instances I see are not in the giant,
complex to config variety and the config file approach is simplest for most
teams.

At least that's my 2 cents :)
-Doug


On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch 
wrote:

> And that managed-schema will reorder the entries and delete the comments on
> first API modification.
>
> Regards,
> Alex
>
> On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey,  wrote:
>
> > On 6/17/2018 6:48 PM, S G wrote:
> > > I only wanted to know if schema.xml offer anything that managed-schema
> > does
> > > not.
> >
> > The only difference between the two is that there is a different
> > filename and the managed version can be modified by API calls.  The
> > schema format and what you can do within that format is identical either
> > way.
> >
> > Thanks,
> > Shawn
> >
> >
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Spring Boot missing required field:

2018-06-19 Thread Andrea Gazzarini

Hi Rushikesh,
If the issue is: "when I set required=true Solr says the field is 
missing, and if I set required="false" I have no problem at all, but 
Solr documents have no value for that field", then trust me, the field 
is missing.


I see two possible points where the issue could be:

 * client side: you say "I have value associated with field"; although
   I think you're saying this because you're really sure about that, I
   suggest you to print out the document just before it is sent to
   Solr. 99% I bet you won't find a value for that field, otherwise at
   last we are sure what is obvious for you it is obvious for the
   machine as well
 * Solr side: are you using some UpdateRequestProcessor in the index
   chain? In case, Is it possible that one of these components is
   removing for somewhat reason that field from the incoming document?

Cheers,
Gazza

On 19/06/18 15:19, Rushikesh Garadade wrote:

Yes Andrea,
I have already tried that, I have value associated with field.
This issue is coming when I set 'required="true"' . If i remove this then
everything works fine. I am not getting why this issue occurs when I set
required="true".

Can you please provide me some pointers to look see what may be the reason.

Thanks,
Rushikesh Garadade

On Sat, Jun 9, 2018 at 2:56 PM Andrea Gazzarini 
wrote:


Hi Rushikesh,
I bet your client is not doing what you think. The error is clear, the
incoming document doesn't have that field.

I would investigate more on the client side. Without entering in
interesting fields like unit testing, I guess the old and good
System.out.println, just before sending the document to Solr, could help
you a lot here.

Best,
Andrea

On Sat, 9 Jun 2018, 10:08 Rushikesh Garadade, 
Hi,
I am using solr 7.3 with java spring boot. I schema of my collection I

have

set schema as

indexed="true"

required="true" stored="true"/>
I have done all other necessary settings required for projects to run.

When I run a application and trying to insert document via CODE. I am
getting error "*missing required field : mailReceiveDate *". Although I
have provided the field value.
Following details code error of the same


Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:

Error

from server at http://:8983/solr:
[doc=8ac2bcf6-7a56-4fed-b83e-7ccc00454088] missing required field:
mailReceiveDate
at



org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)

~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
sarowe - 2018-03-02 15:09:35]
at



org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)

~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
sarowe - 2018-03-02 15:09:35]
at



org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)

~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
sarowe - 2018-03-02 15:09:35]
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
sarowe - 2018-03-02 15:09:35]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
sarowe - 2018-03-02 15:09:35]
at



org.springframework.data.solr.core.SolrTemplate.lambda$saveBean$2(SolrTemplate.java:219)

~[spring-data-solr-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at



org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:166)

~[spring-data-solr-3.0.7.RELEASE.jar:3.0.7.RELEASE]
... 58 common frames omitted


Please let me know what can be the issue?

Thanks,
Rushikesh Garadade





Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Alexandre Rafalovitch
And that managed-schema will reorder the entries and delete the comments on
first API modification.

Regards,
Alex

On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey,  wrote:

> On 6/17/2018 6:48 PM, S G wrote:
> > I only wanted to know if schema.xml offer anything that managed-schema
> does
> > not.
>
> The only difference between the two is that there is a different
> filename and the managed version can be modified by API calls.  The
> schema format and what you can do within that format is identical either
> way.
>
> Thanks,
> Shawn
>
>


Re: tlogs not deleting

2018-06-19 Thread Erick Erickson
bq. Do you recommend disabling the buffer on the source SolrCloud as well?

Disable them all on both source and target IMO.

On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee  wrote:
> Thank you Erick. I am running Solr 6.6. From the documentation:
> "Replicas do not need to buffer updates, and it is recommended to disable 
> buffer on the target SolrCloud."
>
> Do you recommend disabling the buffer on the source SolrCloud as well? It 
> looks like I already have the buffer disabled at target locations but not the 
> source location. Would it even make sense at the source location?
>
> This is what I have at the target locations:
> 
>   
>   100
>   
>   
> disabled
>   
> 
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, June 19, 2018 11:00 AM
> To: solr-user 
> Subject: Re: tlogs not deleting
>
> Take a look at the CDCR section of your reference guide, be sure you get the 
> version which you can download from here:
> https://archive.apache.org/dist/lucene/solr/ref-guide/
>
> There's the CDCR API call you can use for in-flight disabling, and depending 
> on the version of Solr you can set it in solrconfig.
>
> Basically, buffering was there in the original CDCR to allow a larger 
> maintenance window, you could enable buffering and all updates were saved 
> until you disabled it, during which period you could do whatever you needed 
> with your target cluster and not lose any updates.
>
> Later versions can do the full sync of the index and buffering is being 
> removed.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
>> Thanks for the suggestion. Can you please elaborate a little bit about what 
>> DISABLEBUFFER does? The documentation is not very detailed. Is this 
>> something that needs to be done manually whenever this problem happens or is 
>> it something that we can do to fix it so it won't happen again?
>>
>> -Original Message-
>> From: Susheel Kumar [mailto:susheel2...@gmail.com]
>> Sent: Monday, June 18, 2018 9:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: tlogs not deleting
>>
>> You may have to DISABLEBUFFER in source to get rid of tlogs.
>>
>> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
>>
>>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I
>>> understand, after a hard commit, solr is supposed to delete old tlogs
>>> depending on the numRecordsToKeep and maxNumLogsToKeep values in the
>>> autocommit settings in solrconfig.xml. I am occasionally seeing solr
>>> fail to do this and the tlogs just build up over time and eventually
>>> we run out of disk space on the VM and this causes problems for us.
>>> This does not happen all the time, only sometimes. I currently have a
>>> tlog directory that has 123G worth of tlogs. The last hard commit on
>>> this node was 10 minutes ago but these tlogs date back to 3 days ago.
>>>
>>> We have sometimes found that restarting solr on the node will get it
>>> to clean up the old tlogs, but we really want to find the root cause
>>> and fix it if possible so we don't keep getting disk space alerts and
>>> have to adhoc restart nodes. Has anyone seen an issue like this before?
>>>
>>> My update handler settings look like this:
>>>   
>>>
>>>   
>>>
>>>   ${solr.ulog.dir:}
>>>   ${solr.ulog.numVersionBuckets:
>>> 65536}
>>> 
>>> 
>>> 60
>>> 25
>>> false
>>> 
>>> 
>>> 12
>>> 
>>>
>>>   
>>> 100
>>>   
>>>
>>>   
>>>


RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Thank you Erick. I am running Solr 6.6. From the documentation:
"Replicas do not need to buffer updates, and it is recommended to disable 
buffer on the target SolrCloud."

Do you recommend disabling the buffer on the source SolrCloud as well? It looks 
like I already have the buffer disabled at target locations but not the source 
location. Would it even make sense at the source location?

This is what I have at the target locations:

  
  100
  
  
disabled
  



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, June 19, 2018 11:00 AM
To: solr-user 
Subject: Re: tlogs not deleting

Take a look at the CDCR section of your reference guide, be sure you get the 
version which you can download from here:
https://archive.apache.org/dist/lucene/solr/ref-guide/

There's the CDCR API call you can use for in-flight disabling, and depending on 
the version of Solr you can set it in solrconfig.

Basically, buffering was there in the original CDCR to allow a larger 
maintenance window, you could enable buffering and all updates were saved until 
you disabled it, during which period you could do whatever you needed with your 
target cluster and not lose any updates.

Later versions can do the full sync of the index and buffering is being removed.

Best,
Erick

On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
> Thanks for the suggestion. Can you please elaborate a little bit about what 
> DISABLEBUFFER does? The documentation is not very detailed. Is this something 
> that needs to be done manually whenever this problem happens or is it 
> something that we can do to fix it so it won't happen again?
>
> -Original Message-
> From: Susheel Kumar [mailto:susheel2...@gmail.com]
> Sent: Monday, June 18, 2018 9:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: tlogs not deleting
>
> You may have to DISABLEBUFFER in source to get rid of tlogs.
>
> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
>
>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I 
>> understand, after a hard commit, solr is supposed to delete old tlogs 
>> depending on the numRecordsToKeep and maxNumLogsToKeep values in the 
>> autocommit settings in solrconfig.xml. I am occasionally seeing solr 
>> fail to do this and the tlogs just build up over time and eventually 
>> we run out of disk space on the VM and this causes problems for us.
>> This does not happen all the time, only sometimes. I currently have a 
>> tlog directory that has 123G worth of tlogs. The last hard commit on 
>> this node was 10 minutes ago but these tlogs date back to 3 days ago.
>>
>> We have sometimes found that restarting solr on the node will get it 
>> to clean up the old tlogs, but we really want to find the root cause 
>> and fix it if possible so we don't keep getting disk space alerts and 
>> have to adhoc restart nodes. Has anyone seen an issue like this before?
>>
>> My update handler settings look like this:
>>   
>>
>>   
>>
>>   ${solr.ulog.dir:}
>>   ${solr.ulog.numVersionBuckets:
>> 65536}
>> 
>> 
>> 60
>> 25
>> false
>> 
>> 
>> 12
>> 
>>
>>   
>> 100
>>   
>>
>>   
>>


Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Shawn Heisey
On 6/17/2018 6:48 PM, S G wrote:
> I only wanted to know if schema.xml offer anything that managed-schema does
> not.

The only difference between the two is that there is a different
filename and the managed version can be modified by API calls.  The
schema format and what you can do within that format is identical either
way.

Thanks,
Shawn



Re: A field-wide remove duplicate tokens filter

2018-06-19 Thread sarita
- what would be the solution if search  query is (lenovo-A600 in lenovo
mobile)  . As per need i have to use 'worldelimeterfilterfactory' because 
user some time search (lenovoA600) and some time (lenovo a600) . 
 after filter pass 'worlddelimeterfilterfactory'  tokens of main query 
(lenovo-A600 in lenovo mobile)  
 are repeated i.e 'lenovo'



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: tlogs not deleting

2018-06-19 Thread Erick Erickson
Take a look at the CDCR section of your reference guide, be sure you
get the version which you can download from here:
https://archive.apache.org/dist/lucene/solr/ref-guide/

There's the CDCR API call you can use for in-flight disabling, and
depending on the version of Solr you can set it in solrconfig.

Basically, buffering was there in the original CDCR to allow a larger
maintenance window, you could enable buffering and all updates were
saved until you disabled it, during which period you could do whatever
you needed with your target cluster and not lose any updates.

Later versions can do the full sync of the index and buffering is being removed.

Best,
Erick

On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
> Thanks for the suggestion. Can you please elaborate a little bit about what 
> DISABLEBUFFER does? The documentation is not very detailed. Is this something 
> that needs to be done manually whenever this problem happens or is it 
> something that we can do to fix it so it won't happen again?
>
> -Original Message-
> From: Susheel Kumar [mailto:susheel2...@gmail.com]
> Sent: Monday, June 18, 2018 9:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: tlogs not deleting
>
> You may have to DISABLEBUFFER in source to get rid of tlogs.
>
> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
>
>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I
>> understand, after a hard commit, solr is supposed to delete old tlogs
>> depending on the numRecordsToKeep and maxNumLogsToKeep values in the
>> autocommit settings in solrconfig.xml. I am occasionally seeing solr
>> fail to do this and the tlogs just build up over time and eventually
>> we run out of disk space on the VM and this causes problems for us.
>> This does not happen all the time, only sometimes. I currently have a
>> tlog directory that has 123G worth of tlogs. The last hard commit on
>> this node was 10 minutes ago but these tlogs date back to 3 days ago.
>>
>> We have sometimes found that restarting solr on the node will get it
>> to clean up the old tlogs, but we really want to find the root cause
>> and fix it if possible so we don't keep getting disk space alerts and
>> have to adhoc restart nodes. Has anyone seen an issue like this before?
>>
>> My update handler settings look like this:
>>   
>>
>>   
>>
>>   ${solr.ulog.dir:}
>>   ${solr.ulog.numVersionBuckets:
>> 65536}
>> 
>> 
>> 60
>> 25
>> false
>> 
>> 
>> 12
>> 
>>
>>   
>> 100
>>   
>>
>>   
>>


Re: Retrieving Results from both child and parent

2018-06-19 Thread Rushikesh Garadade
I found one solution, however i am not sure whether this is
optimized/corrct solution to it.

Return a mails which contains a word 'pdf' anywhere  : *({!parent
which=internetMessageId:* v=pdf}) OR (internetMessageId:* AND pdf)*
a) ({!parent which=internetMessageId:* v=pdf}) ==> return mails whose
attachment has the keyword 'pdf'
b) (internetMessageId:* AND pdf) ==> returns only mails which contains the
word 'pdf'

Solr identify mail document with *internetMessageId:* *as this field is not
present in attachment document


Please let me know if this is right approch/way to query such results or *Is
there any better/optimized approach*

Thanks,
Rushikesh Garadade

On Tue, Jun 19, 2018 at 6:33 PM Rushikesh Garadade <
rushikeshgarad...@gmail.com> wrote:

> Hello,
> I have stored emails in solr, with its attachments as child documents.
> As per solr structure these attachments got stored in same lines as of
> mails.
> Ex:
> { "id":"1528801242887_f662e5fe-b5d7-4494-acab-c1a99e6cd025", "
> attachmentName":"example_multipage.doc", "attachmentType":
> "application/msword", "_version_":160306431859996}, { "id":
> "1528801242887", "internetMessageId":"1528801242887", "mailboxMessageId":
> "MailBox_4", "from":"mitur...@verizon.net", "to":["podmas...@live.com"], "
> cc":["gslon...@aol.com"], "bcc":["notic...@msn.com", "bes...@gmail.com"],
> "subject":"Getting Started", "mailBody":"\n\nGoogle's headquarters, the
> Googleplex, in August 2014\n ", "mailReceiveDate":"2018-05-07T12:33:51Z",
> "hasAttachment":true, "_version_":160306431859996}
>
> }
>
> Now I want to perform search. What will be search query for:
>
> I want to search a keyword everywhere and return the respective 
> "internetMessageId".
> i.e.
> #1 if word found in mail document -- return internetMessageId
> #2 if word found in attachment document document -- return internetMessageId
> (By fq==>{!parent which=internetMessageId:* v=attachmentName:*})
>
> For achieving both i have store every fields content in "_text_" field
> (using copy field) . I can achieve both #1 & #2 individually. I want to do
> both in single result i.e give me internetMessageId of all mails as well
> as the mails whose attachment contains the the queried word.
>
>
> How can I achieve this??
>
>
> Thanks,
> Rushikesh Garadade
>
>


RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Thanks for the suggestion. Can you please elaborate a little bit about what 
DISABLEBUFFER does? The documentation is not very detailed. Is this something 
that needs to be done manually whenever this problem happens or is it something 
that we can do to fix it so it won't happen again?

-Original Message-
From: Susheel Kumar [mailto:susheel2...@gmail.com] 
Sent: Monday, June 18, 2018 9:12 PM
To: solr-user@lucene.apache.org
Subject: Re: tlogs not deleting

You may have to DISABLEBUFFER in source to get rid of tlogs.

On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:

> So I've read a bunch of stuff on hard/soft commits and tlogs. As I 
> understand, after a hard commit, solr is supposed to delete old tlogs 
> depending on the numRecordsToKeep and maxNumLogsToKeep values in the 
> autocommit settings in solrconfig.xml. I am occasionally seeing solr 
> fail to do this and the tlogs just build up over time and eventually 
> we run out of disk space on the VM and this causes problems for us. 
> This does not happen all the time, only sometimes. I currently have a 
> tlog directory that has 123G worth of tlogs. The last hard commit on 
> this node was 10 minutes ago but these tlogs date back to 3 days ago.
>
> We have sometimes found that restarting solr on the node will get it 
> to clean up the old tlogs, but we really want to find the root cause 
> and fix it if possible so we don't keep getting disk space alerts and 
> have to adhoc restart nodes. Has anyone seen an issue like this before?
>
> My update handler settings look like this:
>   
>
>   
>
>   ${solr.ulog.dir:}
>   ${solr.ulog.numVersionBuckets:
> 65536}
> 
> 
> 60
> 25
> false
> 
> 
> 12
> 
>
>   
> 100
>   
>
>   
>


Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay
Here it is: https://issues.apache.org/jira/browse/SOLR-12499


--  

Christian Spitzlay
Diplom-Physiker,
Senior Software-Entwickler

Tel: +49 69 / 348739116
E-Mail: christian.spitz...@biologis.com

bio.logis Genetic Information Management GmbH
Altenhöferallee 3
60438 Frankfurt am Main

Geschäftsführung: Prof. Dr. med. Daniela Steinberger, Dipl.Betriebswirt Enrico 
Just
Firmensitz Frankfurt am Main, Registergericht Frankfurt am Main, HRB 97945
Umsatzsteuer-Identifikationsnummer DE293587677




> Am 19.06.2018 um 15:30 schrieb Christian Spitzlay 
> :
> 
> Ok. I'm about to create the issue and I have a draft version of what I had in 
> mind 
> in a branch on github.
> 
> Christian Spitzlay
> 
> 
>> Am 19.06.2018 um 15:27 schrieb Joel Bernstein :
>> 
>> Let's move the discussion to the jira ticket.
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>> 
>> On Tue, Jun 19, 2018 at 3:42 AM, Christian Spitzlay <
>> christian.spitz...@biologis.com> wrote:
>> 
>>> 
>>> 
 Am 18.06.2018 um 15:30 schrieb Joel Bernstein :
 
 You are doing things correctly. I was incorrect about the behavior of the
 group() operation.
 
 I think the behavior you are looking for should be done using reduce()
>>> but
 we'll need to create a reduce operation that does this. If you want to
 create a ticket we can work through exactly how the operation would work.
 
>>> 
>>> 
>>> I'll create an issue tonight at the latest.
>>> Should we take further discussions off the user list
>>> or is it acceptable here?
>>> 
>>> 
>>> Christian Spitzlay
>>> 
>>> 
>>> --
>>> 
>>> Christian Spitzlay
>>> Diplom-Physiker,
>>> Senior Software-Entwickler
>>> 
>>> Tel: +49 69 / 348739116
>>> E-Mail: christian.spitz...@biologis.com
>>> 
>>> bio.logis Genetic Information Management GmbH
>>> Altenhöferallee 3
>>> 60438 Frankfurt am Main
>>> 
>>> Geschäftsführung: Prof. Dr. med. Daniela Steinberger, Dipl.Betriebswirt
>>> Enrico Just
>>> Firmensitz Frankfurt am Main, Registergericht Frankfurt am Main, HRB 97945
>>> Umsatzsteuer-Identifikationsnummer DE293587677
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 



Running Solr 5.3.1 with JDK10

2018-06-19 Thread Li, Yi
Hi,

Currently we are running Solr 5.3.1 with JDK8 and we are trying to run Solr 
5.3.1 with JDK10. Initially we got a few errors complaining some JVM options 
are removed since JDK9. We removed those options in solr.in.sh:
UseConcMarkSweepGC
UseParNewGC
PrintHeapAtGC
PrintGCDateStamps
PrintGCTimeStamps
PrintTenuringDistribution
PrintGCApplicationStoppedTime

And the options left in solr.in.sh:

  1.  Enable verbose GC logging
GC_LOG_OPTS="-verbose:gc -XX:+PrintGCDetails"

  1.  These GC settings have shown to work well for a number of common Solr 
workloads
GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

After that SOLR runs but it got an error on SystemInfoHandler Error getting JMX 
properties.
[root@centos6 logs]# service solr status
Found 1 Solr nodes:
Solr process 4630 running on port 8983
ERROR: Failed to get system information from http://localhost:8983/solr due to: 
java.lang.NullPointerException

Can someone share experience using Solr 5.3.x with JDK9 or above?

Thanks,
Yi

P.S.
Console output:
[0.001s][warning][gc] -Xloggc is deprecated. Will use 
-Xlog:gc:/solr/logs/solr_gc.log instead.
[0.001s][warning][gc] -XX:+PrintGCDetails is deprecated. Will use -Xlog:gc* 
instead.
[0.003s][info ][gc] Using Serial
WARNING: System properties and/or JVM args set. Consider using --dry-run or 
--exec
0 INFO (main) [ ] o.e.j.u.log Logging initialized @532ms
205 INFO (main) [ ] o.e.j.s.Server jetty-9.2.11.v20150529
218 WARN (main) [ ] o.e.j.s.h.RequestLogHandler !RequestLog
220 INFO (main) [ ] o.e.j.d.p.ScanningAppProvider Deployment monitor 
file:/home/solr/solr-5.3.1/server/contexts/ at interval 0
559 INFO (main) [ ] o.e.j.w.StandardDescriptorProcessor NO JSP Support for 
/solr, did not find org.apache.jasper.servlet.JspServlet
569 WARN (main) [ ] o.e.j.s.SecurityHandler 
ServletContext@o.e.j.w.WebAppContext@1a75e76a
{/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,STARTING} 
{/home/solr/solr-5.3.1/server/solr-webapp/webapp} has uncovered http methods 
for path: /
577 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init(): 
WebAppClassLoader=1904783235@7188af83
625 INFO (main) [ ] o.a.s.c.SolrResourceLoader JNDI not configured for solr 
(NoInitialContextEx)
626 INFO (main) [ ] o.a.s.c.SolrResourceLoader using system property 
solr.solr.home: /solr/data
627 INFO (main) [ ] o.a.s.c.SolrResourceLoader new SolrResourceLoader for 
directory: '/solr/data/'
750 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from 
/solr/data/solr.xml
817 INFO (main) [ ] o.a.s.c.CoresLocator Config-defined core root directory: 
/solr/data
[1.402s][info ][gc] GC(0) Pause Full (Metadata GC Threshold) 85M->7M(490M) 
37.281ms
875 INFO (main) [ ] o.a.s.c.CoreContainer New CoreContainer 1193398802
875 INFO (main) [ ] o.a.s.c.CoreContainer Loading cores into CoreContainer 
[instanceDir=/solr/data/]
875 INFO (main) [ ] o.a.s.c.CoreContainer loading shared library: /solr/data/lib
875 WARN (main) [ ] o.a.s.c.SolrResourceLoader Can't find (or read) directory 
to add to classloader: lib (resolved as: /solr/data/lib).
889 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory created with 
socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost : 
20,maxConnections : 1,corePoolSize : 0,maximumPoolSize : 
2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy : 
false,useRetries : false,
1036 INFO (main) [ ] o.a.s.u.UpdateShardHandler Creating UpdateShardHandler 
HTTP client with params: socketTimeout=60=6=true
1038 INFO (main) [ ] o.a.s.l.LogWatcher SLF4J impl is 
org.slf4j.impl.Log4jLoggerFactory
1039 INFO (main) [ ] o.a.s.l.LogWatcher Registering Log Listener [Log4j 
(org.slf4j.impl.Log4jLoggerFactory)]
1040 INFO (main) [ ] o.a.s.c.CoreContainer Security conf doesn't exist. 
Skipping setup for authorization module.
1041 INFO (main) [ ] o.a.s.c.CoreContainer No authentication plugin used.
1179 INFO (main) [ ] o.a.s.c.CoresLocator Looking for core definitions 
underneath /solr/data
1180 INFO (main) [ ] o.a.s.c.CoresLocator Found 0 core definitions
1185 INFO (main) [ ] o.a.s.s.SolrDispatchFilter 
user.dir=/home/solr/solr-5.3.1/server
1186 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init() done
1216 INFO (main) [ ] o.e.j.s.h.ContextHandler Started 
o.e.j.w.WebAppContext@1a75e76a{/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,AVAILABLE}{/home/solr/solr-5.3.1/server/solr-webapp/webapp}
1224 INFO (main) [ ] o.e.j.s.ServerConnector Started ServerConnector@2102a4d5
{HTTP/1.1} {0.0.0.0:8983}
1228 INFO (main) [ ] o.e.j.s.Server Started @1762ms
14426 WARN (qtp1045997582-15) [ ] o.a.s.h.a.SystemInfoHandler Error 

Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay
Ok. I'm about to create the issue and I have a draft version of what I had in 
mind 
in a branch on github.

Christian Spitzlay


> Am 19.06.2018 um 15:27 schrieb Joel Bernstein :
> 
> Let's move the discussion to the jira ticket.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Tue, Jun 19, 2018 at 3:42 AM, Christian Spitzlay <
> christian.spitz...@biologis.com> wrote:
> 
>> 
>> 
>>> Am 18.06.2018 um 15:30 schrieb Joel Bernstein :
>>> 
>>> You are doing things correctly. I was incorrect about the behavior of the
>>> group() operation.
>>> 
>>> I think the behavior you are looking for should be done using reduce()
>> but
>>> we'll need to create a reduce operation that does this. If you want to
>>> create a ticket we can work through exactly how the operation would work.
>>> 
>> 
>> 
>> I'll create an issue tonight at the latest.
>> Should we take further discussions off the user list
>> or is it acceptable here?
>> 
>> 
>> Christian Spitzlay
>> 
>> 
>> --
>> 
>> Christian Spitzlay
>> Diplom-Physiker,
>> Senior Software-Entwickler
>> 
>> Tel: +49 69 / 348739116
>> E-Mail: christian.spitz...@biologis.com
>> 
>> bio.logis Genetic Information Management GmbH
>> Altenhöferallee 3
>> 60438 Frankfurt am Main
>> 
>> Geschäftsführung: Prof. Dr. med. Daniela Steinberger, Dipl.Betriebswirt
>> Enrico Just
>> Firmensitz Frankfurt am Main, Registergericht Frankfurt am Main, HRB 97945
>> Umsatzsteuer-Identifikationsnummer DE293587677
>> 
>> 
>> 
>> 
>> 
>> 
>> 



Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Joel Bernstein
Let's move the discussion to the jira ticket.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 19, 2018 at 3:42 AM, Christian Spitzlay <
christian.spitz...@biologis.com> wrote:

>
>
> > Am 18.06.2018 um 15:30 schrieb Joel Bernstein :
> >
> > You are doing things correctly. I was incorrect about the behavior of the
> > group() operation.
> >
> > I think the behavior you are looking for should be done using reduce()
> but
> > we'll need to create a reduce operation that does this. If you want to
> > create a ticket we can work through exactly how the operation would work.
> >
>
>
> I'll create an issue tonight at the latest.
> Should we take further discussions off the user list
> or is it acceptable here?
>
>
> Christian Spitzlay
>
>
> --
>
> Christian Spitzlay
> Diplom-Physiker,
> Senior Software-Entwickler
>
> Tel: +49 69 / 348739116
> E-Mail: christian.spitz...@biologis.com
>
> bio.logis Genetic Information Management GmbH
> Altenhöferallee 3
> 60438 Frankfurt am Main
>
> Geschäftsführung: Prof. Dr. med. Daniela Steinberger, Dipl.Betriebswirt
> Enrico Just
> Firmensitz Frankfurt am Main, Registergericht Frankfurt am Main, HRB 97945
> Umsatzsteuer-Identifikationsnummer DE293587677
>
>
>
>
>
>
>


Re: Connection Problem with CloudSolrClient.Builder().build When passing a Zookeeper Addresses and RootParam

2018-06-19 Thread THADC
Thank you Andy,

The problem was as you suspected, the "http://; prefixes. The odd thing is
that I used to use the one param constructor with the solr node URL list
(like: CloudSolrClient.Builder(solrServerURLLList).build();). I could not
get that one to work without the "http://; prefix.

Anyway between removing the prefix and using  chrootOption = 
Optional.empty(), my problems are solved. 

You have literally made my day!

Tim



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spring Boot missing required field:

2018-06-19 Thread Rushikesh Garadade
Yes Andrea,
I have already tried that, I have value associated with field.
This issue is coming when I set 'required="true"' . If i remove this then
everything works fine. I am not getting why this issue occurs when I set
required="true".

Can you please provide me some pointers to look see what may be the reason.

Thanks,
Rushikesh Garadade

On Sat, Jun 9, 2018 at 2:56 PM Andrea Gazzarini 
wrote:

> Hi Rushikesh,
> I bet your client is not doing what you think. The error is clear, the
> incoming document doesn't have that field.
>
> I would investigate more on the client side. Without entering in
> interesting fields like unit testing, I guess the old and good
> System.out.println, just before sending the document to Solr, could help
> you a lot here.
>
> Best,
> Andrea
>
> On Sat, 9 Jun 2018, 10:08 Rushikesh Garadade,  >
> wrote:
>
> > Hi,
> > I am using solr 7.3 with java spring boot. I schema of my collection I
> have
> > set schema as
> >  indexed="true"
> > required="true" stored="true"/>
> > I have done all other necessary settings required for projects to run.
> >
> > When I run a application and trying to insert document via CODE. I am
> > getting error "*missing required field : mailReceiveDate *". Although I
> > have provided the field value.
> > Following details code error of the same
> >
> >
> > Caused by:
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> > from server at http://:8983/solr:
> > [doc=8ac2bcf6-7a56-4fed-b83e-7ccc00454088] missing required field:
> > mailReceiveDate
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
> > ~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
> > sarowe - 2018-03-02 15:09:35]
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
> > ~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
> > sarowe - 2018-03-02 15:09:35]
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
> > ~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
> > sarowe - 2018-03-02 15:09:35]
> > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
> > ~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
> > sarowe - 2018-03-02 15:09:35]
> > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
> > ~[solr-solrj-6.6.3.jar:6.6.3 d1e9bbd333ea55cfa0c75d324424606e857a775b -
> > sarowe - 2018-03-02 15:09:35]
> > at
> >
> >
> org.springframework.data.solr.core.SolrTemplate.lambda$saveBean$2(SolrTemplate.java:219)
> > ~[spring-data-solr-3.0.7.RELEASE.jar:3.0.7.RELEASE]
> > at
> >
> >
> org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:166)
> > ~[spring-data-solr-3.0.7.RELEASE.jar:3.0.7.RELEASE]
> > ... 58 common frames omitted
> >
> >
> > Please let me know what can be the issue?
> >
> > Thanks,
> > Rushikesh Garadade
> >
>


Retrieving Results from both child and parent

2018-06-19 Thread Rushikesh Garadade
Hello,
I have stored emails in solr, with its attachments as child documents.
As per solr structure these attachments got stored in same lines as of
mails.
Ex:
{ "id":"1528801242887_f662e5fe-b5d7-4494-acab-c1a99e6cd025", "attachmentName
":"example_multipage.doc", "attachmentType":"application/msword", "_version_
":160306431859996}, { "id":"1528801242887", "internetMessageId":
"1528801242887", "mailboxMessageId":"MailBox_4", "from":"
mitur...@verizon.net", "to":["podmas...@live.com"], "cc":["gslon...@aol.com"
], "bcc":["notic...@msn.com", "bes...@gmail.com"], "subject":"Getting
Started", "mailBody":"\n\nGoogle's headquarters, the Googleplex, in August
2014\n ", "mailReceiveDate":"2018-05-07T12:33:51Z", "hasAttachment":true, "
_version_":160306431859996}

}

Now I want to perform search. What will be search query for:

I want to search a keyword everywhere and return the respective
"internetMessageId".
i.e.
#1 if word found in mail document -- return internetMessageId
#2 if word found in attachment document document -- return internetMessageId
(By fq==>{!parent which=internetMessageId:* v=attachmentName:*})

For achieving both i have store every fields content in "_text_" field
(using copy field) . I can achieve both #1 & #2 individually. I want to do
both in single result i.e give me internetMessageId of all mails as well as
the mails whose attachment contains the the queried word.


How can I achieve this??


Thanks,
Rushikesh Garadade


Re: Solr cloud with different JVM size nodes

2018-06-19 Thread Emir Arnautović
Hi Rishi,
It is not uncommon to have tiers in your cluster assuming you weighted if it is 
the best choice.

I would remind you that 32GB is not a good heap size since you cannot use 
compressed OOPS. Check what is the limit of your JVM but 30GB is a safe bet.
Also, what did you mean be “got high field cache”?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Jun 2018, at 09:11, Rishikant Snigh  wrote:
> 
> Hello everyone,
> 
> I am planning to create a a solr cloud with 16GB and 32GB nodes.
> Some what to create an underneath pseudo cluster -
> 32G to hold historical data(got high field cache).
> 16G to hold regular collections.
> 
> NOTE - Shards of collection placed on 16G will never be placed on 32G and
> vice versa.
> 
> Do you guys see an impact ?
> 
> Thanks, Rishi



Re: SOLR migration

2018-06-19 Thread Emir Arnautović
Hi Ana,
There is no documentation because this is not something that is common. 
Assuming you are using SolrCloud and that you don’t want any downtime. What you 
could do is set up new Solr node on the same box but configure it to use this 
new disk. After it is set, you use ADDREPLICA and REMOVEREPLICA to “move” 
replicas from one to another and shut down the old node.

If you can afford downtime, you can simply set up new Solr to use new disk, 
create empty collections and copy indices from one disk to another.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Jun 2018, at 10:26, Ana Mercan (RO)  wrote:
> 
> Hello guys,
> 
> I would appreciate if you could kindly treat this topic with priority as
> the lack of documentation is kind of a blocker for us.
> 
> Thanks in advance,
> Ana
> 
> 
> On Mon, Jun 18, 2018 at 4:56 PM, Ana Mercan (RO)  wrote:
> 
>> Hi,
>> 
>> I have the following scenario, I'm having a shared cluster solr
>> installation environment (app server 1-app server 2 load balanced) which
>> has 4 solr instances.
>> 
>> After reviewing the space audit we have noticed that the partition where
>> the installation resides is too big versus what is used in term of space.
>> 
>> Therefore we have installed a new drive which is smaller and now we want
>> to migrate from the old dive (E:) to the new drive (F).
>> 
>> Can you please provide an official answer whether this is a supported
>> scenario?
>> 
>> If yes, will you please share the steps with us?
>> 
>> Thanks,
>> 
>> Ana
>> 
> 
> -- 
> The information transmitted is intended only for the person or entity to 
> which it is addressed and may contain confidential and/or privileged 
> material. Any review, retransmission, dissemination or other use of, or 
> taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error, please contact the sender and delete the material from any 
> computer.



Re: some solr replicas down

2018-06-19 Thread Chris Ulicny
Satya,

There should be some other log messages that are probably relevant to the
issue you are having. Something along the lines of "leader cannot
communicate with follower...publishing replica as down." It's likely there
also is a message of "expecting json/xml but got html" in another
instance's logs.

We've seen this problem in various scenarios in our own clusters, usually
during high volumes of requests, and what seems to be happening to us is
the following.

Since authentication is enabled, all requests between nodes must be
authenticated, and Solr is using a timestamp to do this (in some way, not
sure on the details). When the recipient of the request processes it, the
timestamp is checked to see if it is within the Time-To-Live (TTL)
millisecond value (default of 5000). If the timestamp is too old, the
request is rejected with the above error and a response of 401 is delivered
to the sender.

When a request is sent from the leader to the follower and receives a 401
response, the leader becomes too proactive sometimes and declares the
replica down. In older versions (6.3.0), it seems that the replica will
never recover automatically (manually delete the down replicas and add new
ones to fix). Fortunately, as of 7.2.1 (maybe earlier) the down replicas
will usually start to recover at some point (and the leaders seem less
proactive to declare replicas down). Although, we have had cases where they
did not recover after being down for hours on 7.2.1.

Likely the solution to the problem is to increase the TTL value by adding
the line

SOLR_OPTS="$SOLR_OPTS -Dpkiauth.ttl=##"

to the solr environment file (solr.in.sh) on each node and restarting them.
Replace # with some millisecond value of your choice. I'd suggest just
increasing it by intervals of 5s to start. If this does not fix your
problem, then there is likely too much pressure on your hardware for some
reason or another.

Hopefully that helps.

If anyone with more knowledge about the authentication plugin has
corrections, wants fill in gaps, or has an idea to figure out what requests
cause this issue. It'd be greatly appreciated.

Best,
Chris

On Mon, Jun 18, 2018 at 9:38 AM Satya Marivada 
wrote:

> Hi, We are using solr 6.3.0 and a collection has 3 of 4 replicas down and 1
> is up and serving.
>
> I see a single line error repeating in logs as below. nothing else specific
> exception apart from it. Wondering what this below message is saying, is it
> the cause of nodes being down, but saw that this happened even before the
> repllicas went down.
>
> 2018-06-18 04:45:51.818 ERROR (qtp1528637575-27215) [c:poi s:shard1
> r:core_node5 x:poi_shard1_replica3] o.a.s.s.PKIAuthenticationPlugin Invalid
> key request timestamp: 1529297138215 , received timestamp: 1529297151817 ,
> TTL: 5000
>
> Thanks,
> Satya
>


Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Mikhail Khludnev
you need to index num vals

in the separate field, and then *:* -(V:(A AND B) AND numVals:2) -(V:(A OR
B) AND numVals:1)


On Tue, Jun 19, 2018 at 9:20 AM Wei  wrote:

> Hi,
>
> I have a multi-value field,  and there is a limited set of values for the
> field: A, B, C, D.
> Is there a way to filter out documents that has only A or B values in the
> multi-value field?
>
> Basically I want to  exclude document that has:
>
> A
>
> B
>
> A B
>
> and get documents that has:
>
>
> C
>
> D
>
> C D
>
> A C
>
> B C
>
> A D
>
> B D
>
> A B C
>
> A B D
>
> A C D
>
> B C D
>
> A B C D
>
>
> Thanks,
>
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Alessandro Benedetti
The first idea that comes in my mind is to build a single valued copy field
which concatenates them.
in this way you will have very specific values to filter on :

query1 -(copyfield:(A B AB))

To concatenate you can use this update request processor :
https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

Regards




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR migration

2018-06-19 Thread Ana Mercan (RO)
Hello guys,

I would appreciate if you could kindly treat this topic with priority as
the lack of documentation is kind of a blocker for us.

Thanks in advance,
Ana


On Mon, Jun 18, 2018 at 4:56 PM, Ana Mercan (RO)  wrote:

> Hi,
>
> I have the following scenario, I'm having a shared cluster solr
> installation environment (app server 1-app server 2 load balanced) which
> has 4 solr instances.
>
> After reviewing the space audit we have noticed that the partition where
> the installation resides is too big versus what is used in term of space.
>
> Therefore we have installed a new drive which is smaller and now we want
> to migrate from the old dive (E:) to the new drive (F).
>
> Can you please provide an official answer whether this is a supported
> scenario?
>
> If yes, will you please share the steps with us?
>
> Thanks,
>
> Ana
>

-- 
The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this in error, please contact the sender and delete the material from any 
computer.


Re: Solrj does not support ltr ?

2018-06-19 Thread Alessandro Benedetti
Pretty sure you can't.
As far as I know there is no client side implementation to help with managed
resourced in general.
Any contribution is welcome!



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay



> Am 18.06.2018 um 15:30 schrieb Joel Bernstein :
> 
> You are doing things correctly. I was incorrect about the behavior of the
> group() operation.
> 
> I think the behavior you are looking for should be done using reduce() but
> we'll need to create a reduce operation that does this. If you want to
> create a ticket we can work through exactly how the operation would work.
> 


I'll create an issue tonight at the latest.
Should we take further discussions off the user list
or is it acceptable here?


Christian Spitzlay


--  

Christian Spitzlay
Diplom-Physiker,
Senior Software-Entwickler

Tel: +49 69 / 348739116
E-Mail: christian.spitz...@biologis.com

bio.logis Genetic Information Management GmbH
Altenhöferallee 3
60438 Frankfurt am Main

Geschäftsführung: Prof. Dr. med. Daniela Steinberger, Dipl.Betriebswirt Enrico 
Just
Firmensitz Frankfurt am Main, Registergericht Frankfurt am Main, HRB 97945
Umsatzsteuer-Identifikationsnummer DE293587677








Solr cloud with different JVM size nodes

2018-06-19 Thread Rishikant Snigh
Hello everyone,

I am planning to create a a solr cloud with 16GB and 32GB nodes.
Some what to create an underneath pseudo cluster -
32G to hold historical data(got high field cache).
16G to hold regular collections.

NOTE - Shards of collection placed on 16G will never be placed on 32G and
vice versa.

Do you guys see an impact ?

Thanks, Rishi


Re: Solrj does not support ltr ?

2018-06-19 Thread shreck
i am not sure if  i can do this "curl -XPUT
'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary
"@/path/myModel.json" -H 'Content-type:application/json'"   by solrj .




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to exclude certain values in multi-value field filter query

2018-06-19 Thread Wei
Hi,

I have a multi-value field,  and there is a limited set of values for the
field: A, B, C, D.
Is there a way to filter out documents that has only A or B values in the
multi-value field?

Basically I want to  exclude document that has:

A

B

A B

and get documents that has:


C

D

C D

A C

B C

A D

B D

A B C

A B D

A C D

B C D

A B C D


Thanks,

Wei


Re: Solr Odbc for Parallel Sql integration with Tableau

2018-06-19 Thread Aroop Ganguly
Hi Joel

Yes I was able to make the ODBC bridge work very easily (using steps mentioned 
here https://github.com/risdenk/solrj-jdbc-testing/blob/master/odbc/README.md 
 ), 

But the actual Tableau integration has not been fruitful yet due to 2 reasons:

1. Tableau inherently writes inner queries : select a as A, b as B from (select 
* from c)
 — this fails immediately, as parallel sql in my experience does not like 
inner queries.

2. The default Tableau view which is awesome to drag and drop the entire table 
does not work for parallel sql as we need to specify a “limit” otherwise it 
keeps giving the error about “score”. So I defaulted to the custom query option 
on Tableau but it failed for the inherent inner-queryness of Tableau as 
mentioned in 1. :) 
 — I will keep at it tomorrow and maybe I will be able to figure a way out.


> On Jun 18, 2018, at 7:55 PM, Joel Bernstein  wrote:
> 
> That's interesting that you were able to setup OpenLink. At Alfresco we've
> done quite a bit of work on the Solr's JDBC driver to integrate it with the
> Alfresco repository, which uses Solr. But we haven't yet tackled the ODBC
> setup. That will come very soon. To really take advantage of Tableau's
> capabilities we will need to add joins to Solr's parallel SQL. Solr already
> uses Apache Calcite, which has a join optimizer, so mainly this would
> involve hooking up the various Streaming Expression joins.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Mon, Jun 18, 2018 at 6:37 PM, Aroop Ganguly 
> wrote:
> 
>> Ok I was able to setup the odic bridge (using OpenLink) and I see the
>> collections popping up in Tableau too.
>> But I am unable to actually get data flowing into Tableau reports because,
>> Tableau keeps creating inner queries and Solr seems to hate inner queries.
>> Is there a way to do inner queries in Solr Parallel Sql ?
>> 
>>> On Jun 18, 2018, at 12:30 PM, Aroop Ganguly 
>> wrote:
>>> 
>>> 
>>> Hi Everyone
>>> 
>>> I am not sure if something has been done on this yet, though I did see a
>> JIRA with links to the parallel sql documentation, but I do not think that
>> answers the question.
>>> 
>>> I love the jdbc driver and it works well for many UIs but there are
>> other systems that need an ODBC driver.
>>> 
>>> Can anyone share any guidance as to how this can be done or has been
>> done by others.
>>> 
>>> Thanks
>>> Aroop
>> 
>>