Re: fl rename of unique key in solrcloud

2014-11-14 Thread Jeon Woosung
Could you let me know version of the solr?

On Sat, Nov 15, 2014 at 5:05 AM, Suchi Amalapurapu 
wrote:

> Hi
> Getting the following exception when using fl renaming with unique key in
> the schema.
> http:///solr//select?q=dress&fl=a1:p1
>
> where p1 is the unique key for 
> For collections with single shard, this works flawlessly but results in the
> following exception in case of multiple shards.
>
> How do we fix this? Stack trace below.
> Suchi
>
> error": {"trace": "java.lang.NullPointerException\n\tat
>
> org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:998)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:653)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
> java.lang.Thread.run(Thread.java:662)\n","code": 500
>



-- 
*God bless U*


Re: solr IRC

2014-11-14 Thread Anurag Sharma
Also like know, is this the only IRC or there are other's as well like solr
dev, lucene dev  etc

On Sat, Nov 15, 2014 at 10:59 AM, Anurag Sharma  wrote:

> I tried couple of weeks earlier as well. As suggested, will try again
> after mid next week.
>
> On Sat, Nov 15, 2014 at 10:53 AM, Alexandre Rafalovitch <
> arafa...@gmail.com> wrote:
>
>> If you tried this week it is because everybody was at the conference. Try
>> again mid next week.
>>
>> Regards,
>>  Alex
>> On 14/11/2014 11:35 pm, "Anurag Sharma"  wrote:
>>
>> > Is this correct link to Solr IRC -
>> > http://webchat.freenode.net/?channels=#solr
>> > I tried couple of times using the IRC, the list of online users are
>> always
>> > good but never get any response on the query and also don't see any
>> > communication/discussion.
>> >
>>
>
>


Re: solr IRC

2014-11-14 Thread Anurag Sharma
I tried couple of weeks earlier as well. As suggested, will try again after
mid next week.

On Sat, Nov 15, 2014 at 10:53 AM, Alexandre Rafalovitch 
wrote:

> If you tried this week it is because everybody was at the conference. Try
> again mid next week.
>
> Regards,
>  Alex
> On 14/11/2014 11:35 pm, "Anurag Sharma"  wrote:
>
> > Is this correct link to Solr IRC -
> > http://webchat.freenode.net/?channels=#solr
> > I tried couple of times using the IRC, the list of online users are
> always
> > good but never get any response on the query and also don't see any
> > communication/discussion.
> >
>


Re: solr IRC

2014-11-14 Thread Alexandre Rafalovitch
If you tried this week it is because everybody was at the conference. Try
again mid next week.

Regards,
 Alex
On 14/11/2014 11:35 pm, "Anurag Sharma"  wrote:

> Is this correct link to Solr IRC -
> http://webchat.freenode.net/?channels=#solr
> I tried couple of times using the IRC, the list of online users are always
> good but never get any response on the query and also don't see any
> communication/discussion.
>


Re: Hierarchical faceting

2014-11-14 Thread Evan Pease
Hi Rashmi,

Here is some more details on how to use PathHierarchyTokenizer that Oleg
provided the link to.

If this is your document:

> *Sample document*
> 
> name=Pbook1
> category=NonFic/Sci/Phy/Quantum
> author=ABC
> price=20.00
> 

Then, in your schema.xml:



  

  
  

  


Then, in your Solr query, you can simply add:

&facet=true
&facet.field=category

You should see a facet that contains each level of the taxonomy with counts.

To navigate the taxonomy you add filter queries using the part of the path
you want narrow the results down to (values from the category facet).

So, for example a user clicks on "NonFic"

&facet=true
&facet.field=category
&fq={!term f=category}NonFic

Then "NonFic/Sci"

&fq={!term f=category}NonFic/Sci

Then "NonFic/Sci/Phy"

&fq={!term f=category}NonFic/Sci/Phy

etc..

If you only want to display the leaf level category and indent child
categories you can easily do this in your UI by splitting the facet value
on your separator, "/" in this case.


Thanks,
Evan



On Nov 14, 2014 8:06 PM, "Oleg Savrasov"  wrote:

> Hi Rashmi,
>
> I believe you are looking for PathHierarchyTokenizer,
> see
>
> https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html
>
> Oleg
>
> 2014-11-14 17:53 GMT-05:00 rashmy1 :
>
> > Hello,
> > I'm trying to setup Solr for fetching hierarchical facets.
> > Please advice which of the below approaches should be followed for my
> > scenario.
> > *Scenario:
> > *
> > NonFic
> > Hist
> > HistBook1
> > HistBook2
> > Sci
> > Phy
> > Quantum
> > Pbook1
> > Pbook2
> > Thermodynamics
> > Pbook3
> > Pbook4
> > Chem
> > Cbook1
> > Math
> > Mbook1
> > Fic
> > Mystery
> > Mybook1
> > Childrens
> > Chbook1
> > Chbook2
> >
> > *Sample document*
> > 
> > name=Pbook1
> > category=NonFic/Sci/Phy/Quantum
> > author=ABC
> > price=20.00
> > 
> >
> > *Requirements:*
> > -Show drill down facets
> > -If user searched for "*", the initial set of facets to be shown are
> > 'NonFic' and 'Fic'
> > -If user selects facet 'NonFic', we then show the facets 'Hist' and 'Sci'
> > only.
> >
> > *Option1:*
> > /Solr schema:/
> >  > stored="true" type="string"/>
> > /Document supplied for indexing:/
> > 
> > name=Pbook1
> > category=0/NonFic
> > category=1/NonFic/Sci
> > category=2/NonFic/Sci/Phy
> > category=3/NonFic/Sci/Phy/Quantum
> > category=0/Other (a book can belong to multiple categories)
> > author=ABC
> > price=20.00
> > 
> > With Option2, we can do a drill down facet query.
> > For example, if we give facet.prefix=NonFic/Sci/, the facet results are:
> > NonFic/Sci/Phy
> > NonFic/Sci/Chem
> > NonFic/Sci/Math
> > The only issue is that I have to take care of generating all possible
> path
> > information for 'category'
> >
> > *Option2:*
> > /Solr schema:/
> > 
> >   
> >  > delimiter="/"/>
> >   
> > 
> >  > stored="true" type="path"/>
> > /Document supplied for indexing:/
> > 
> > name=Pbook1
> > category=NonFic/Sci/Phy/Quantum
> > author=ABC
> > price=20.00
> > 
> > With Option2, we can do facet query but it returns all possible
> combination
> > of paths.
> > For example, if we give facet.prefix=Fic, the facet results are:
> > Fic (3)
> > Fic/Mystery (1)
> > Fic/Childrens (2)
> >
> >
> > I'm looking to supply a doc with just a single entry (like
> > 'category=NonFic/Sci/Phy/Quantum' ) and be able to do a drill down query.
> > Is
> > there some existing Solr tokernizer which takes care of generating all
> > possibly combinations which indexing instead of having to generating them
> > as
> > part of  creation?
> >
> > Thanks
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


solr IRC

2014-11-14 Thread Anurag Sharma
Is this correct link to Solr IRC -
http://webchat.freenode.net/?channels=#solr
I tried couple of times using the IRC, the list of online users are always
good but never get any response on the query and also don't see any
communication/discussion.


Re: Hierarchical faceting

2014-11-14 Thread Oleg Savrasov
Hi Rashmi,

I believe you are looking for PathHierarchyTokenizer,
see
https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html

Oleg

2014-11-14 17:53 GMT-05:00 rashmy1 :

> Hello,
> I'm trying to setup Solr for fetching hierarchical facets.
> Please advice which of the below approaches should be followed for my
> scenario.
> *Scenario:
> *
> NonFic
> Hist
> HistBook1
> HistBook2
> Sci
> Phy
> Quantum
> Pbook1
> Pbook2
> Thermodynamics
> Pbook3
> Pbook4
> Chem
> Cbook1
> Math
> Mbook1
> Fic
> Mystery
> Mybook1
> Childrens
> Chbook1
> Chbook2
>
> *Sample document*
> 
> name=Pbook1
> category=NonFic/Sci/Phy/Quantum
> author=ABC
> price=20.00
> 
>
> *Requirements:*
> -Show drill down facets
> -If user searched for "*", the initial set of facets to be shown are
> 'NonFic' and 'Fic'
> -If user selects facet 'NonFic', we then show the facets 'Hist' and 'Sci'
> only.
>
> *Option1:*
> /Solr schema:/
>  stored="true" type="string"/>
> /Document supplied for indexing:/
> 
> name=Pbook1
> category=0/NonFic
> category=1/NonFic/Sci
> category=2/NonFic/Sci/Phy
> category=3/NonFic/Sci/Phy/Quantum
> category=0/Other (a book can belong to multiple categories)
> author=ABC
> price=20.00
> 
> With Option2, we can do a drill down facet query.
> For example, if we give facet.prefix=NonFic/Sci/, the facet results are:
> NonFic/Sci/Phy
> NonFic/Sci/Chem
> NonFic/Sci/Math
> The only issue is that I have to take care of generating all possible path
> information for 'category'
>
> *Option2:*
> /Solr schema:/
> 
>   
>  delimiter="/"/>
>   
> 
>  stored="true" type="path"/>
> /Document supplied for indexing:/
> 
> name=Pbook1
> category=NonFic/Sci/Phy/Quantum
> author=ABC
> price=20.00
> 
> With Option2, we can do facet query but it returns all possible combination
> of paths.
> For example, if we give facet.prefix=Fic, the facet results are:
> Fic (3)
> Fic/Mystery (1)
> Fic/Childrens (2)
>
>
> I'm looking to supply a doc with just a single entry (like
> 'category=NonFic/Sci/Phy/Quantum' ) and be able to do a drill down query.
> Is
> there some existing Solr tokernizer which takes care of generating all
> possibly combinations which indexing instead of having to generating them
> as
> part of  creation?
>
> Thanks
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-14 Thread Shawn Heisey
On 11/14/2014 9:51 AM, henry cleland wrote:
> How do I search only a subset of my corpus based on a large list of non
> consecutive unique key ids (cannot do a range query).
> Is there a way around doing this  q=id:(id1 OR id2 OR id3 OR id4 ... OR
> id4 ) AND name:*
>
> Also what is the limit of "OR"s i can apply on the query if that is the
> only way out, i don't suppose it is infinity.

Very large boolean queries can be slow.  You might want to use the
TermsQuery parser instead:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

For what you are actually trying to do, the number of OR clauses you can
have is determined by two things:

1) The maxBooleanClauses value in solrconfig.xml, which defaults to 1024.

2) The size of the query string allowed.

2a) If the request is a GET, the max http header size controls how long
your query (and the other URL paramters) can be.  This almost always
defaults to 8K, or 8192 bytes, and includes other things, like the "GET"
command, the URL path, and the protocol (usually HTTP/1.1).  This header
size is configurable in the servlet container config.

2b) If the request is a POST, the size limit in any recent Solr version
for the request will usually be 2MB.  That limit is configurable in
solrconfig.xml.

I'm trying to get the maxBooleanClauses limit removed:

https://issues.apache.org/jira/browse/SOLR-4586

Thanks,
Shawn



Hierarchical faceting

2014-11-14 Thread rashmy1
Hello,
I'm trying to setup Solr for fetching hierarchical facets.
Please advice which of the below approaches should be followed for my
scenario.
*Scenario:
*
NonFic
Hist
HistBook1
HistBook2
Sci
Phy
Quantum
Pbook1
Pbook2
Thermodynamics
Pbook3
Pbook4
Chem
Cbook1
Math
Mbook1
Fic
Mystery
Mybook1
Childrens
Chbook1
Chbook2

*Sample document*

name=Pbook1
category=NonFic/Sci/Phy/Quantum
author=ABC
price=20.00


*Requirements:*
-Show drill down facets
-If user searched for "*", the initial set of facets to be shown are
'NonFic' and 'Fic'
-If user selects facet 'NonFic', we then show the facets 'Hist' and 'Sci'
only.

*Option1:*
/Solr schema:/

/Document supplied for indexing:/

name=Pbook1
category=0/NonFic
category=1/NonFic/Sci
category=2/NonFic/Sci/Phy
category=3/NonFic/Sci/Phy/Quantum
category=0/Other (a book can belong to multiple categories)
author=ABC
price=20.00

With Option2, we can do a drill down facet query.
For example, if we give facet.prefix=NonFic/Sci/, the facet results are:
NonFic/Sci/Phy
NonFic/Sci/Chem
NonFic/Sci/Math
The only issue is that I have to take care of generating all possible path
information for 'category'

*Option2:*
/Solr schema:/

  

  


/Document supplied for indexing:/

name=Pbook1
category=NonFic/Sci/Phy/Quantum
author=ABC
price=20.00

With Option2, we can do facet query but it returns all possible combination
of paths.
For example, if we give facet.prefix=Fic, the facet results are:
Fic (3)
Fic/Mystery (1)
Fic/Childrens (2)


I'm looking to supply a doc with just a single entry (like
'category=NonFic/Sci/Phy/Quantum' ) and be able to do a drill down query. Is
there some existing Solr tokernizer which takes care of generating all
possibly combinations which indexing instead of having to generating them as
part of  creation?

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Duplicate scoring situation in DelegatingCollector

2014-11-14 Thread Andy Crossen
Hi folks,

I have a DelegatingCollector installed via a PostFilter (kind of like an
AnalyticsQuery) that needs the document score to a) add to a collection of
score-based stats, and b) decide whether to keep the document based on the
score.

If I keep the document, I call super.collect() (where super is a
TopScoreDocCollector), which re-scores the document in its collect method.
The scoring is custom and reasonably expensive.

Is there an easy way to avoid this?  Or do I have to stop calling
super.collect(), manage my own bitset/PQ, and pass the filtered results in
the DelegatingCollector's finish() method?

There's a thread out there ("Configurable collectors for custom ranking")
that kind of talks about the above.  Seems cumbersome.

Thanks for any direction!


Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
Yeah - I would want it fixed as a default setting of some sort, maybe 
in-built in the Suggester class, so you wouldn't be required to have 
something in config to make it work in a reasonable way. Glad my 
insomnia went to some purpose.


-MIke

On 11/14/2014 02:12 PM, Walter Underwood wrote:

That fixed it.

I bet that would fix the problem with the very long startup that another user 
had. That’s a bug in the default solrconfig.xml, it should persist the 
dictionaries.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 14, 2014, at 12:42 AM, Michael Sokolov  
wrote:


It looks like you have to define "storeDir", and if you don't then the rebuild 
no longer happens, as you said.  I think that goes in the config block you showed, but I 
haven't tested this (we use a different suggester with its own persistence strategy).

-Mike

On 11/14/14 2:01 AM, Walter Underwood wrote:

We get no suggestions until we force a build with suggest.build=true. Maybe we 
need to define a spellchecker component to get that behavior?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 13, 2014, at 10:56 PM, Michael Sokolov  
wrote:


I believe the spellchecker component persists these indexes now and reloads 
them on restart rather than rebuilding.

-Mike

On 11/13/14 7:40 PM, Walter Underwood wrote:

We have to manually rebuild the suggest dictionaries after a restart. This 
seems odd, since someone else had a problem because they did rebuild after 
restart.

We’re running 4.7 and our dictionaries are configured like this. We do this for 
several fields.

 
   fieldName
   FuzzyLookupFactory
   DocumentDictionaryFactory
   fieldName
   qualityScore
   string
   true
 
  wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/






fl rename of unique key in solrcloud

2014-11-14 Thread Suchi Amalapurapu
Hi
Getting the following exception when using fl renaming with unique key in
the schema.
http:///solr//select?q=dress&fl=a1:p1

where p1 is the unique key for 
For collections with single shard, this works flawlessly but results in the
following exception in case of multiple shards.

How do we fix this? Stack trace below.
Suchi

error": {"trace": "java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:998)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:653)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
java.lang.Thread.run(Thread.java:662)\n","code": 500


Re: DIH Blob data

2014-11-14 Thread Anurag Sharma
Thanks Michael & Eric for the succinct response.

On Sat, Nov 15, 2014 at 12:13 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> There is a binary type
>
> -Mike
>
> On 11/14/2014 12:21 PM, Anurag Sharma wrote:
>
>> bq: We routinely store images and pdfs in Solr. There *is* a benefit,
>> since
>> you don't need to manage another storage system, you don't have to worry
>> about Solr getting out of sync with the other system, you can use Solr
>> replication for all your assets, etc.
>>
>> Do the same holds good for large Blobs like image, audio, video as well?
>> Tika supports multiple file formats (http://tika.apache.org/1.5/
>> formats.html)
>> but not sure how good is the Solr/Tika combination. Storing pdf and other
>> docs could be useful in Solr, tika can extract metadata from the docs and
>> make them discoverable.
>>
>> Considering all the above cases there should also be a support for File
>> field type in Solr like other types Date, Float, Int, Long, String etc.
>> but
>> looks like there are only two file types (
>> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/
>> core/src/java/org/apache/solr/schema/)
>> and both re external file storage.
>>
>> - ExternalFileField.java
>> > core/src/java/org/apache/solr/schema/ExternalFileField.java>
>> - ExternalFileFieldReloader.java
>> > core/src/java/org/apache/solr/schema/ExternalFileFieldReloader.java>
>>
>> What type can be used in schema when storing the files internally?
>>
>>
>> On Thu, Nov 13, 2014 at 3:48 AM, Jeon Woosung 
>> wrote:
>>
>>  How about this?
>>>
>>> First, define a field for filter query. It should be multivalued.
>>>
>>> Second, implements transformer to extract json dynamic fields, and put
>>> the
>>> dynamic fields into the solr field.
>>>
>>> For example,
>>>
>>> 
>>>
>>> Data : {a:1,b:2,c:3}
>>>
>>> You can split the data to "a:1", "b:2", "c:3", and put them into terms.
>>>
>>> And then you can use filter query like "fq=terms:a:1"
>>> 2014. 11. 13. 오전 3:59에 "Michael Sokolov" >>
 님이

>>> 작성:
>>>
>>>  We routinely store images and pdfs in Solr. There *is* a benefit, since
 you don't need to manage another storage system, you don't have to worry
 about Solr getting out of sync with the other system, you can use Solr
 replication for all your assets, etc.

 I don't use DIH, so personally I don't care whether it handles blobs,
 but
 it does seem like a natural extension for a system that indexes data
 from
 SQL in Solr.

 -Mike


 On 11/12/2014 01:31 PM, Anurag Sharma wrote:

  BLOB is non-searchable field so there is no benefit of storing it into
> Solr. Any external key-value store can be used to store the blob and
> reference of this blob can be stored as a string field in Solr.
>
> On Wed, Nov 12, 2014 at 5:56 PM, stockii 
> wrote:
>
>   I had a similar problem and didnt find any solution to use the fields
>
 in
>>>
 JSON
>> Blob for a filter ... Not with DIH.
>>
>>
>>
>> --
>> View this message in context:
>>
>>  http://lucene.472066.n3.nabble.com/DIH-Blob-data-
>>> tp4168896p4168925.html
>>>
 Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>


Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Walter Underwood
That fixed it.

I bet that would fix the problem with the very long startup that another user 
had. That’s a bug in the default solrconfig.xml, it should persist the 
dictionaries.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 14, 2014, at 12:42 AM, Michael Sokolov  
wrote:

> It looks like you have to define "storeDir", and if you don't then the 
> rebuild no longer happens, as you said.  I think that goes in the config 
> block you showed, but I haven't tested this (we use a different suggester 
> with its own persistence strategy).
> 
> -Mike
> 
> On 11/14/14 2:01 AM, Walter Underwood wrote:
>> We get no suggestions until we force a build with suggest.build=true. Maybe 
>> we need to define a spellchecker component to get that behavior?
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/
>> 
>> 
>> On Nov 13, 2014, at 10:56 PM, Michael Sokolov 
>>  wrote:
>> 
>>> I believe the spellchecker component persists these indexes now and reloads 
>>> them on restart rather than rebuilding.
>>> 
>>> -Mike
>>> 
>>> On 11/13/14 7:40 PM, Walter Underwood wrote:
 We have to manually rebuild the suggest dictionaries after a restart. This 
 seems odd, since someone else had a problem because they did rebuild after 
 restart.
 
 We’re running 4.7 and our dictionaries are configured like this. We do 
 this for several fields.
 
 
   fieldName
   FuzzyLookupFactory
   DocumentDictionaryFactory
   fieldName
   qualityScore
   string
   true
 
  wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/
 
 
> 



Re: DIH Blob data

2014-11-14 Thread Erick Erickson
Right, a more nuanced comment involves what _type_ of docs you're
storing, and what the ratio of searchable-to-overall size is. Consider
an image. The searchable data may be 0.01% of the file size. Or even
worse, a movie.

As always, "it depends". I guess that personally I'm not a fan of
using Solr as a fie store when you have to be prepared to re-index
from scratch sometime _anyway_ (IMO), in which case you often might as
well serve the data from the system-of-record since it's there anyway.
IOW, I need to be convinced the use-case really merits it. And the
particular use-case may very well mean it's a fine solution

So if the use-case merits it, storing files in Solr is fine I just
wonder when it comes to docs with lots of non-searchable bytes and
relatively few searchable bytes.

Best,
Erick

On Fri, Nov 14, 2014 at 2:02 PM, Michael Sokolov
 wrote:
>
> On 11/14/2014 01:43 PM, Erick Erickson wrote:
>>
>> Just skimming, so maybe I misinterpreted.
>>
>> ExternalFileField and ExternalFileFieldReloader
>> refer to storing values for each doc in an external file, they have
>> nothing to do with storing _files_.
>>
>> The usual pattern is to have Solr store just enough data to have the
>> system-of-record return the actual file rather than have Solr
>> actually store the file. Solr isn't really built for this and while some
>> people do this it usually is a poor design if for no other reason than
>> as segments merge, the data gets copied again and again and again
>> to no good purpose.
>
> I was worried about this, and spent a bunch of time working on a custom
> codec that would store files externally (to avoid the merge penalty), while
> still living inside the Solr/Lucene ecosystem. It was a lot of complicated
> work, and after a while I thought I'd better do some careful performance
> measurements to make sure it was worthwhile.  What I found was that the
> merge cost was not very high relative to other indexing costs we were paying
> (indexing large full text documents with fairly complex analysis, but
> nothing unusual). So I don't think this particular performance argument
> against storage in Solr/Lucene is telling, at least for many ratios of
> stored doc size to indexed tokens size. It's also worth mentioning that my
> test involved reindexing every document once (basically a query-level
> replication of an existing index), so perhaps the amount of merging was less
> than it might be in other cases.
>
> I can see that there might be other reasons to store documents elsewhere,
> but in my experience, with our use case, it actually works pretty well to
> store them in Lucene indexes.  Consider, for example, that if you are
> highlighting, you are probably already storing the full text of each
> document anyway. In our case we also need to store a marked-up version of
> the full text (so we can highlight an html view of a document as well as
> deliver plain text snippets), so the incremental cost of storing pdfs was
> not crushing.  Of course these could all be stored externally, too. Maybe
> we'll try that and get massive performance increases :)
>
> -Mike


Re: DIH Blob data

2014-11-14 Thread Michael Sokolov


On 11/14/2014 01:43 PM, Erick Erickson wrote:

Just skimming, so maybe I misinterpreted.

ExternalFileField and ExternalFileFieldReloader
refer to storing values for each doc in an external file, they have
nothing to do with storing _files_.

The usual pattern is to have Solr store just enough data to have the
system-of-record return the actual file rather than have Solr
actually store the file. Solr isn't really built for this and while some
people do this it usually is a poor design if for no other reason than
as segments merge, the data gets copied again and again and again
to no good purpose.
I was worried about this, and spent a bunch of time working on a custom 
codec that would store files externally (to avoid the merge penalty), 
while still living inside the Solr/Lucene ecosystem. It was a lot of 
complicated work, and after a while I thought I'd better do some careful 
performance measurements to make sure it was worthwhile.  What I found 
was that the merge cost was not very high relative to other indexing 
costs we were paying (indexing large full text documents with fairly 
complex analysis, but nothing unusual). So I don't think this particular 
performance argument against storage in Solr/Lucene is telling, at least 
for many ratios of stored doc size to indexed tokens size. It's also 
worth mentioning that my test involved reindexing every document once 
(basically a query-level replication of an existing index), so perhaps 
the amount of merging was less than it might be in other cases.


I can see that there might be other reasons to store documents 
elsewhere, but in my experience, with our use case, it actually works 
pretty well to store them in Lucene indexes.  Consider, for example, 
that if you are highlighting, you are probably already storing the full 
text of each document anyway. In our case we also need to store a 
marked-up version of the full text (so we can highlight an html view of 
a document as well as deliver plain text snippets), so the incremental 
cost of storing pdfs was not crushing.  Of course these could all be 
stored externally, too. Maybe we'll try that and get massive performance 
increases :)


-Mike


Re: DIH Blob data

2014-11-14 Thread Michael Sokolov

There is a binary type

-Mike

On 11/14/2014 12:21 PM, Anurag Sharma wrote:

bq: We routinely store images and pdfs in Solr. There *is* a benefit, since
you don't need to manage another storage system, you don't have to worry
about Solr getting out of sync with the other system, you can use Solr
replication for all your assets, etc.

Do the same holds good for large Blobs like image, audio, video as well?
Tika supports multiple file formats (http://tika.apache.org/1.5/formats.html)
but not sure how good is the Solr/Tika combination. Storing pdf and other
docs could be useful in Solr, tika can extract metadata from the docs and
make them discoverable.

Considering all the above cases there should also be a support for File
field type in Solr like other types Date, Float, Int, Long, String etc. but
looks like there are only two file types (
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/)
and both re external file storage.

- ExternalFileField.java


- ExternalFileFieldReloader.java



What type can be used in schema when storing the files internally?


On Thu, Nov 13, 2014 at 3:48 AM, Jeon Woosung  wrote:


How about this?

First, define a field for filter query. It should be multivalued.

Second, implements transformer to extract json dynamic fields, and put the
dynamic fields into the solr field.

For example,



Data : {a:1,b:2,c:3}

You can split the data to "a:1", "b:2", "c:3", and put them into terms.

And then you can use filter query like "fq=terms:a:1"
2014. 11. 13. 오전 3:59에 "Michael Sokolov" 
님이

작성:


We routinely store images and pdfs in Solr. There *is* a benefit, since
you don't need to manage another storage system, you don't have to worry
about Solr getting out of sync with the other system, you can use Solr
replication for all your assets, etc.

I don't use DIH, so personally I don't care whether it handles blobs, but
it does seem like a natural extension for a system that indexes data from
SQL in Solr.

-Mike


On 11/12/2014 01:31 PM, Anurag Sharma wrote:


BLOB is non-searchable field so there is no benefit of storing it into
Solr. Any external key-value store can be used to store the blob and
reference of this blob can be stored as a string field in Solr.

On Wed, Nov 12, 2014 at 5:56 PM, stockii 
wrote:

  I had a similar problem and didnt find any solution to use the fields

in

JSON
Blob for a filter ... Not with DIH.



--
View this message in context:


http://lucene.472066.n3.nabble.com/DIH-Blob-data-tp4168896p4168925.html

Sent from the Solr - User mailing list archive at Nabble.com.






Re: DIH Blob data

2014-11-14 Thread Erick Erickson
Just skimming, so maybe I misinterpreted.

ExternalFileField and ExternalFileFieldReloader
refer to storing values for each doc in an external file, they have
nothing to do with storing _files_.

The usual pattern is to have Solr store just enough data to have the
system-of-record return the actual file rather than have Solr
actually store the file. Solr isn't really built for this and while some
people do this it usually is a poor design if for no other reason than
as segments merge, the data gets copied again and again and again
to no good purpose.

Best,
Erick

On Fri, Nov 14, 2014 at 12:21 PM, Anurag Sharma  wrote:
> bq: We routinely store images and pdfs in Solr. There *is* a benefit, since
> you don't need to manage another storage system, you don't have to worry
> about Solr getting out of sync with the other system, you can use Solr
> replication for all your assets, etc.
>
> Do the same holds good for large Blobs like image, audio, video as well?
> Tika supports multiple file formats (http://tika.apache.org/1.5/formats.html)
> but not sure how good is the Solr/Tika combination. Storing pdf and other
> docs could be useful in Solr, tika can extract metadata from the docs and
> make them discoverable.
>
> Considering all the above cases there should also be a support for File
> field type in Solr like other types Date, Float, Int, Long, String etc. but
> looks like there are only two file types (
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/)
> and both re external file storage.
>
>- ExternalFileField.java
>
> 
>- ExternalFileFieldReloader.java
>
> 
>
> What type can be used in schema when storing the files internally?
>
>
> On Thu, Nov 13, 2014 at 3:48 AM, Jeon Woosung  wrote:
>
>> How about this?
>>
>> First, define a field for filter query. It should be multivalued.
>>
>> Second, implements transformer to extract json dynamic fields, and put the
>> dynamic fields into the solr field.
>>
>> For example,
>>
>> 
>>
>> Data : {a:1,b:2,c:3}
>>
>> You can split the data to "a:1", "b:2", "c:3", and put them into terms.
>>
>> And then you can use filter query like "fq=terms:a:1"
>> 2014. 11. 13. 오전 3:59에 "Michael Sokolov" > >님이
>> 작성:
>>
>> > We routinely store images and pdfs in Solr. There *is* a benefit, since
>> > you don't need to manage another storage system, you don't have to worry
>> > about Solr getting out of sync with the other system, you can use Solr
>> > replication for all your assets, etc.
>> >
>> > I don't use DIH, so personally I don't care whether it handles blobs, but
>> > it does seem like a natural extension for a system that indexes data from
>> > SQL in Solr.
>> >
>> > -Mike
>> >
>> >
>> > On 11/12/2014 01:31 PM, Anurag Sharma wrote:
>> >
>> >> BLOB is non-searchable field so there is no benefit of storing it into
>> >> Solr. Any external key-value store can be used to store the blob and
>> >> reference of this blob can be stored as a string field in Solr.
>> >>
>> >> On Wed, Nov 12, 2014 at 5:56 PM, stockii 
>> >> wrote:
>> >>
>> >>  I had a similar problem and didnt find any solution to use the fields
>> in
>> >>> JSON
>> >>> Blob for a filter ... Not with DIH.
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://lucene.472066.n3.nabble.com/DIH-Blob-data-tp4168896p4168925.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >
>>


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-14 Thread Jürgen Wagner (DVT)
Hi guy,
  there's not much of a search operation here. Why not store the
documents in a key/value store and simply fetch them by matching ids?

Another approach:  as there is no query, you could easily partition the
set of ids and fetch the results in multiple batches.

The maximum number of clauses should be 1024. You can set it to a higher
value using the respective method in
org.apache.lucene.search.BooleanQuery (I've never done that one before,
though).

Now, your mileage may vary. What is the idea behind this retrieval? You
really want to fetch objects by id? Check out MemcacheDB or Apache
Cassandra or Apache CouchDB, depending on your application and the type
of information you want to store.

Best regards,
--Jürgen

On 14.11.2014 17:51, henry cleland wrote:
> Hi guys,
> How do I search only a subset of my corpus based on a large list of non
> consecutive unique key ids (cannot do a range query).
> Is there a way around doing this  q=id:(id1 OR id2 OR id3 OR id4 ... OR
> id4 ) AND name:*
>
> Also what is the limit of "OR"s i can apply on the query if that is the
> only way out, i don't suppose it is infinity.
> Thanks
>


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
, URL: www.devoteam.de



Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-14 Thread Anurag Sharma
Is it possible to add another integer dyanmicField to the selected doc ids?
If yes, further can add update incremental/same values to these docs now
search can be done to this subset using range/filter query.

On Fri, Nov 14, 2014 at 10:21 PM, henry cleland 
wrote:

> Hi guys,
> How do I search only a subset of my corpus based on a large list of non
> consecutive unique key ids (cannot do a range query).
> Is there a way around doing this  q=id:(id1 OR id2 OR id3 OR id4 ... OR
> id4 ) AND name:*
>
> Also what is the limit of "OR"s i can apply on the query if that is the
> only way out, i don't suppose it is infinity.
> Thanks
>


Re: DIH Blob data

2014-11-14 Thread Anurag Sharma
bq: We routinely store images and pdfs in Solr. There *is* a benefit, since
you don't need to manage another storage system, you don't have to worry
about Solr getting out of sync with the other system, you can use Solr
replication for all your assets, etc.

Do the same holds good for large Blobs like image, audio, video as well?
Tika supports multiple file formats (http://tika.apache.org/1.5/formats.html)
but not sure how good is the Solr/Tika combination. Storing pdf and other
docs could be useful in Solr, tika can extract metadata from the docs and
make them discoverable.

Considering all the above cases there should also be a support for File
field type in Solr like other types Date, Float, Int, Long, String etc. but
looks like there are only two file types (
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/)
and both re external file storage.

   - ExternalFileField.java
   

   - ExternalFileFieldReloader.java
   


What type can be used in schema when storing the files internally?


On Thu, Nov 13, 2014 at 3:48 AM, Jeon Woosung  wrote:

> How about this?
>
> First, define a field for filter query. It should be multivalued.
>
> Second, implements transformer to extract json dynamic fields, and put the
> dynamic fields into the solr field.
>
> For example,
>
> 
>
> Data : {a:1,b:2,c:3}
>
> You can split the data to "a:1", "b:2", "c:3", and put them into terms.
>
> And then you can use filter query like "fq=terms:a:1"
> 2014. 11. 13. 오전 3:59에 "Michael Sokolov"  >님이
> 작성:
>
> > We routinely store images and pdfs in Solr. There *is* a benefit, since
> > you don't need to manage another storage system, you don't have to worry
> > about Solr getting out of sync with the other system, you can use Solr
> > replication for all your assets, etc.
> >
> > I don't use DIH, so personally I don't care whether it handles blobs, but
> > it does seem like a natural extension for a system that indexes data from
> > SQL in Solr.
> >
> > -Mike
> >
> >
> > On 11/12/2014 01:31 PM, Anurag Sharma wrote:
> >
> >> BLOB is non-searchable field so there is no benefit of storing it into
> >> Solr. Any external key-value store can be used to store the blob and
> >> reference of this blob can be stored as a string field in Solr.
> >>
> >> On Wed, Nov 12, 2014 at 5:56 PM, stockii 
> >> wrote:
> >>
> >>  I had a similar problem and didnt find any solution to use the fields
> in
> >>> JSON
> >>> Blob for a filter ... Not with DIH.
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://lucene.472066.n3.nabble.com/DIH-Blob-data-tp4168896p4168925.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >
>


Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-14 Thread henry cleland
Hi guys,
How do I search only a subset of my corpus based on a large list of non
consecutive unique key ids (cannot do a range query).
Is there a way around doing this  q=id:(id1 OR id2 OR id3 OR id4 ... OR
id4 ) AND name:*

Also what is the limit of "OR"s i can apply on the query if that is the
only way out, i don't suppose it is infinity.
Thanks


Re: One ZooKeeper and many Solr clouds

2014-11-14 Thread Enrico Trucco
Thank you very much, Jürgen.


2014-11-14 13:51 GMT+01:00 "Jürgen Wagner (DVT)" <
juergen.wag...@devoteam.com>:

>  Hello Enrico,
>   you may use the chroot feature of Zookeeper to root the different
> SolrCloud instances differently. Instead of zoohost1:2181, you can use
> zoohost1:2181/cluster1 as the Zookeeper location. Unless there is a load
> issue with high rates of updates and other data traffic, a single Zookeeper
> ensemble can very well handle multiple SolrCloud instances.
>
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
>
> Best regards,
> --Jürgen
>
>
>
> On 14.11.2014 13:41, Enrico Trucco wrote:
>
> Hello
>
> I am considering to start using Solr Cloud and to share a single ZooKeeper
> between different Solr clouds and eventually other software.
>
> In all the examples I see online, the configuration of a Solr cloud is
> stored in the root node of ZooKeeper.
> I was wandering if it is possible to specify the node under which Solr
> stores its configuration.
> For example, let's suppose that i have 2 Solr clouds (solr1 and solr2)
> and another software sharing the same zookeeper instance and storing files
> under other1.
>
> I would like to have 3 nodes like
> /solr1
> /solr2
> /other1
> Each of them containing files of a single entity.
> I will manage isolation through ZooKeeper ACL functionalities.
>
> Is there a way to achieve this?
>
> Regards
> Enrico
>
>
>
>
> --
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> --
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
>


Re: One ZooKeeper and many Solr clouds

2014-11-14 Thread Jürgen Wagner (DVT)
Hello Enrico,
  you may use the chroot feature of Zookeeper to root the different
SolrCloud instances differently. Instead of zoohost1:2181, you can use
zoohost1:2181/cluster1 as the Zookeeper location. Unless there is a load
issue with high rates of updates and other data traffic, a single
Zookeeper ensemble can very well handle multiple SolrCloud instances.

https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

Best regards,
--Jürgen


On 14.11.2014 13:41, Enrico Trucco wrote:
> Hello
>
> I am considering to start using Solr Cloud and to share a single ZooKeeper
> between different Solr clouds and eventually other software.
>
> In all the examples I see online, the configuration of a Solr cloud is
> stored in the root node of ZooKeeper.
> I was wandering if it is possible to specify the node under which Solr
> stores its configuration.
> For example, let's suppose that i have 2 Solr clouds (solr1 and solr2)
> and another software sharing the same zookeeper instance and storing files
> under other1.
>
> I would like to have 3 nodes like
> /solr1
> /solr2
> /other1
> Each of them containing files of a single entity.
> I will manage isolation through ZooKeeper ACL functionalities.
>
> Is there a way to achieve this?
>
> Regards
> Enrico
>


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
, URL: www.devoteam.de



Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




One ZooKeeper and many Solr clouds

2014-11-14 Thread Enrico Trucco
Hello

I am considering to start using Solr Cloud and to share a single ZooKeeper
between different Solr clouds and eventually other software.

In all the examples I see online, the configuration of a Solr cloud is
stored in the root node of ZooKeeper.
I was wandering if it is possible to specify the node under which Solr
stores its configuration.
For example, let's suppose that i have 2 Solr clouds (solr1 and solr2)
and another software sharing the same zookeeper instance and storing files
under other1.

I would like to have 3 nodes like
/solr1
/solr2
/other1
Each of them containing files of a single entity.
I will manage isolation through ZooKeeper ACL functionalities.

Is there a way to achieve this?

Regards
Enrico


RE: Handling growth

2014-11-14 Thread Toke Eskildsen
Patrick Henry [patricktheawesomeg...@gmail.com] wrote:

>I am working with a Solr collection that is several terabytes in size over
> several hundred millions of documents.  Each document is very rich, and
> over the past few years we have consistently quadrupled the size our
> collection annually.  Unfortunately, this sits on a single node with only a
> few hundred megabytes of memory - so our performance is less than ideal.

I assume you mean gigabytes of memory. If you have not already done so, 
switching to SSDs for storage should buy you some more time.

> [Going for SolrCloud]  We are in a continuous adding documents and never 
> change
> existing ones.  Based on that, one individual recommended for me to
> implement custom hashing and route the latest documents to the shard with
> the least documents, and when that shard fills up add a new shard and index
> on the new shard, rinse and repeat.

We have quite a similar setup, where we produce a never-changing shard once 
every 8 days and add it to our cloud. One could also combine this setup with a 
single live shard, for keeping the full index constantly up to date. The memory 
overhead of running an immutable shard is smaller than a mutable one and easier 
to fine-tune. It also allows you to optimize the index down to a single 
segment, which requires a bit less processing power and saves memory when 
faceting. There's a description of our setup at 
http://sbdevel.wordpress.com/net-archive-search/

>From an administrative point of view, we like having complete control over 
>each shard. We keep track of what goes in it and in case of schema or analyze 
>chain changes, we can re-build each shard one at a time and deploy them 
>continuously, instead of having to re-build everything in one go on a parallel 
>setup. Of course, fundamental changes to the schema would require a complete 
>re-build before deploy, so we hope to avoid that.

- Toke Eskildsen


Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
It looks like you have to define "storeDir", and if you don't then the 
rebuild no longer happens, as you said.  I think that goes in the config 
block you showed, but I haven't tested this (we use a different 
suggester with its own persistence strategy).


-Mike

On 11/14/14 2:01 AM, Walter Underwood wrote:

We get no suggestions until we force a build with suggest.build=true. Maybe we 
need to define a spellchecker component to get that behavior?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 13, 2014, at 10:56 PM, Michael Sokolov  
wrote:


I believe the spellchecker component persists these indexes now and reloads 
them on restart rather than rebuilding.

-Mike

On 11/13/14 7:40 PM, Walter Underwood wrote:

We have to manually rebuild the suggest dictionaries after a restart. This 
seems odd, since someone else had a problem because they did rebuild after 
restart.

We’re running 4.7 and our dictionaries are configured like this. We do this for 
several fields.

 
   fieldName
   FuzzyLookupFactory
   DocumentDictionaryFactory
   fieldName
   qualityScore
   string
   true
 
  wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/