Re: Solr Auto-Complete

2015-12-04 Thread Alexandre Rafalovitch
You can see an example of similar use at:
http://www.solr-start.com/javadoc/solr-lucene/index.html (search box).

The corresponding schema is here:
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
. It does have some extra special-case stuff to allow to search by the
fragments, but the general use case is the same.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 December 2015 at 10:11, Salman Ansari  wrote:
> Thanks Alan, Alessandaro and Andrea for your great explanations. I will
> follow the path of adding edge ngrams to the field type for my use case.
>
> Regards,
> Salman
>
> On Thu, Dec 3, 2015 at 12:23 PM, Alessandro Benedetti > wrote:
>
>> "Sounds good but I heard "/suggest" component is the recommended way of
>> doing auto-complete"
>>
>> This sounds fantastic :)
>> We "heard" that as well, we know what the suggest component does.
>> The point is that you would like to retrieve the suggestions + some
>> consistent payload in different fields.
>> Current suggest component offers some effort in providing a payload, but
>> almost all the suggester implementation are based on an FST approach which
>> aim to be as fast and memory efficient as possible.
>> Honestly you could experiment and even contribute a customisation if you
>> want to add a new feature to the suggest component able to return complex
>> payloads together with the suggestions.
>> Apart that, it strictly depends of how you want to provide the
>> autocompletion, there are plenty of different lookups implementation and
>> plenty of tokenizer/token filters to combine .
>> So I would confirm what we already said and that Andrea confirmed.
>>
>> If anyone has played with the suggester suggestions payload, his feedback
>> is welcome!
>>
>> Cheers
>>
>>
>> On 3 December 2015 at 06:21, Andrea Gazzarini 
>> wrote:
>>
>> > Hi Salman,
>> > few months ago I have been involved in a project similar to
>> > map.geoadmin.ch
>> > and there, I had your same need (I also sent an email to this list).
>> >
>> > From my side I can furtherly confirm what Alan and Alessandro already
>> > explained, I followed that approach.
>> >
>> > IMHO, that is the "recommended way" if the component's features meet your
>> > needs (i.e. do not reinvent the wheel) but it seems you're out of those
>> > bounds.
>> >
>> > Best,
>> > Andrea
>> > On 2 Dec 2015 21:51, "Salman Ansari"  wrote:
>> >
>> > > Sounds good but I heard "/suggest" component is the recommended way of
>> > > doing auto-complete in the new versions of Solr. Something along the
>> > lines
>> > > of this article
>> > > https://cwiki.apache.org/confluence/display/solr/Suggester
>> > >
>> > > 
>> > >   
>> > > mySuggester
>> > > FuzzyLookupFactory
>> > > DocumentDictionaryFactory
>> > > cat
>> > > price
>> > > string
>> > > false
>> > >   
>> > > 
>> > >
>> > > Can someone confirm this?
>> > >
>> > > Regards,
>> > > Salman
>> > >
>> > >
>> > > On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti <
>> > > abenede...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Salman,
>> > > > I agree with Alan.
>> > > > Just configure your schema with the proper analysers .
>> > > > For the field you want to use for suggestions you are likely to need
>> > > simply
>> > > > this fieldType :
>> > > >
>> > > > > > > > positionIncrementGap="100">
>> > > > 
>> > > > 
>> > > > 
>> > > > > minGramSize="1"
>> > > > maxGramSize="20"/>
>> > > > 
>> > > > 
>> > > > 
>> > > > 
>> > > > 
>> > > > 
>> > > >
>> > > > This is a very sample example, please adapt it to your use case.
>> > > >
>> > > > Cheers
>> > > >
>> > > > On 2 December 2015 at 09:41, Alan Woodward  wrote:
>> > > >
>> > > > > Hi Salman,
>> > > > >
>> > > > > It sounds as though you want to do a normal search against a
>> special
>> > > > > 'suggest' field, that's been indexed with edge ngrams.
>> > > > >
>> > > > > Alan Woodward
>> > > > > www.flax.co.uk
>> > > > >
>> > > > >
>> > > > > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am looking for auto-complete in Solr but on top of just auto
>> > > > complete I
>> > > > > > want as well to return the data completely (not just
>> suggestions),
>> > > so I
>> > > > > > want to get back the ids, and other fields in the whole
>> document. I
>> > > > tried
>> > > > > > the following 2 approaches but each had issues
>> > > > > >
>> > > > > > 1) Used the /suggest component but that returns a very specific
>> > > format
>> > > > > > which looks like I cannot customize. I want to return the whole
>> > > > document
>> > > > > > that has a matching field and not only the suggestion list. So
>> for
>> > > > > example,
>> > > > 

Re: Highlighting large documents

2015-12-04 Thread Zheng Lin Edwin Yeo
Hi Andrea,

I'm using the original highlighter.

Below is my configuration for the highlighter in solrconfig.xml

  
   
   explicit
   10
   json
   true
  text
  id, title, content_type, last_modified, url, score 

  on
   id, title, content, author 
  true
   true
   html
  200
  100

true
signature
true
100
  
  


Have you managed to solve the problem?

Regards,
Edwin


On 4 December 2015 at 23:54, Andrea Gazzarini  wrote:

> Hi Zheng,
> just curiousity, because shortly I will have to deal with a similar
> scenario (Solr 5.3.1 + large documents + highlighting).
> Which highlighter are you using?
>
> Andrea
>
> 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Hi,
> >
> > I'm using Solr 5.3.0
> >
> > I found that in large documents, sometimes I face situation that when I
> do
> > a highlight query, the resultset that is returned does not contain the
> > highlighted query. There are actually matches in the documents, but just
> > that they located further back in the documents.
> >
> > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > default value is 51200, and I have documents that are much larger than
> > 51200 characters. Although this method works, but, when I increase this
> > value, the performance of the search and highlight drops. It can drop
> from
> > less than 0.5 seconds to more than 10 seconds.
> >
> > Would like to check, is this method of increasing the value of the
> > hl.maxAnalyzedChars the best method to use, or is there other ways which
> > can solve the same purpose, but without affecting the performance much?
> >
> > Regards,
> > Edwin
> >
>


Re: Highlighting large documents

2015-12-04 Thread Andrea Gazzarini
Hi Zheng,
just curiousity, because shortly I will have to deal with a similar
scenario (Solr 5.3.1 + large documents + highlighting).
Which highlighter are you using?

Andrea

2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo :

> Hi,
>
> I'm using Solr 5.3.0
>
> I found that in large documents, sometimes I face situation that when I do
> a highlight query, the resultset that is returned does not contain the
> highlighted query. There are actually matches in the documents, but just
> that they located further back in the documents.
>
> I have tried to increase the value of the hl.maxAnalyzedChars, as the
> default value is 51200, and I have documents that are much larger than
> 51200 characters. Although this method works, but, when I increase this
> value, the performance of the search and highlight drops. It can drop from
> less than 0.5 seconds to more than 10 seconds.
>
> Would like to check, is this method of increasing the value of the
> hl.maxAnalyzedChars the best method to use, or is there other ways which
> can solve the same purpose, but without affecting the performance much?
>
> Regards,
> Edwin
>


Some errors migrating to solr cloud

2015-12-04 Thread tedsolr
I had a fairly simple plan for migrating my single solr instance with
multiple cores, to a solrcloud implementation where core => collection. My
testing locally (windows) worked fine, but the first linux (development)
environment I tried to migrate had some failures. This is v5.2.1.

The setup: Single linux box for two solr nodes: ports 8983 & 8984. Both are
children of the only SOLRHOME folder. Just using the embedded ZK. Single
shard collections with replication factor of 2.
SOLRHOME/server/solr (port 8983 with embedded ZK on 9983)
- solr start -c
SOLRHOME/server/solr2 (port 8984)
- solr start -c -p 8984 -s solr2 -z localhost:9983
 
The plan: Start solr in cloud mode; upload config to ZK; create collections
via the collections API; stop solr; copy the "data" folders from the old
cores into the new collections on 8983 (/solr); start solr again

The first symptom of the problem was trying to stop all nodes with "solr
stop -all". It only shut down node 8983. When I then tried "solr stop -p
8984" it had to kill it. Then I noticed the errors in the log: "Error while
trying to recover. Server refused connection at:
http://10.0.5.213:8984/solr; & "Error while trying to recover. No registered
leader was found after waiting for 4000ms".

All the indexes I moved were very small - less than 1MB. Only 2 out of 5
collections replicated when solr restarted. The only "clue" (unless its just
coincidence) is the two that worked had 8983 as their leader node. The other
3 collections had 8984 - which doesn't have a ZK. Its confusing because this
same plan worked on my local machine - even when a collection had 8984 as
the leader. Is there a flaw in my plan? Maybe I have to force the leaders to
be the same node with the ZK? Why didn't "solr stop -all" work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Some-errors-migrating-to-solr-cloud-tp4243594.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Rick Leir
On Fri, Dec 4, 2015 at 12:59 AM, 
wrote:

>
> >Just wondering if folks have any suggestions on using Schema.xml vs.
> >Managed Schema going forward.
> >


We are using loosely typed languages (Perl and Javascript), and a loosely
typed DB (CouchDB). This is consistent with running Solr in Schemaless
mode, and doing more unit tests. When you post a doc into Solr containing a
field which has not been seen before, Solr chooses the most appropriate
Type. There is no Java exception and the field data is searchable. You can
discover the Type by looking at the Solr console. We can probably log it
too.

The new field might be due to us intentionally adding it, though we should
be methodical and systematic about adding new fields.

Or it could be due to unexpected input to the ingest scripts, (but I
believe these scripts should clean their inputs).

Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
the ingest scripts should have tests so we can claim they are bug free.


However, I brought up this topic with my colleagues here, and they are sure
we should stick with Schema.xml. ".. some level of control and expectation
of exactly what kind of data is in our search system wouldn't be helpful
.." So be it.
Cheers -- Rick


Re: Solr Auto-Complete

2015-12-04 Thread Salman Ansari
Thanks Alan, Alessandaro and Andrea for your great explanations. I will
follow the path of adding edge ngrams to the field type for my use case.

Regards,
Salman

On Thu, Dec 3, 2015 at 12:23 PM, Alessandro Benedetti  wrote:

> "Sounds good but I heard "/suggest" component is the recommended way of
> doing auto-complete"
>
> This sounds fantastic :)
> We "heard" that as well, we know what the suggest component does.
> The point is that you would like to retrieve the suggestions + some
> consistent payload in different fields.
> Current suggest component offers some effort in providing a payload, but
> almost all the suggester implementation are based on an FST approach which
> aim to be as fast and memory efficient as possible.
> Honestly you could experiment and even contribute a customisation if you
> want to add a new feature to the suggest component able to return complex
> payloads together with the suggestions.
> Apart that, it strictly depends of how you want to provide the
> autocompletion, there are plenty of different lookups implementation and
> plenty of tokenizer/token filters to combine .
> So I would confirm what we already said and that Andrea confirmed.
>
> If anyone has played with the suggester suggestions payload, his feedback
> is welcome!
>
> Cheers
>
>
> On 3 December 2015 at 06:21, Andrea Gazzarini 
> wrote:
>
> > Hi Salman,
> > few months ago I have been involved in a project similar to
> > map.geoadmin.ch
> > and there, I had your same need (I also sent an email to this list).
> >
> > From my side I can furtherly confirm what Alan and Alessandro already
> > explained, I followed that approach.
> >
> > IMHO, that is the "recommended way" if the component's features meet your
> > needs (i.e. do not reinvent the wheel) but it seems you're out of those
> > bounds.
> >
> > Best,
> > Andrea
> > On 2 Dec 2015 21:51, "Salman Ansari"  wrote:
> >
> > > Sounds good but I heard "/suggest" component is the recommended way of
> > > doing auto-complete in the new versions of Solr. Something along the
> > lines
> > > of this article
> > > https://cwiki.apache.org/confluence/display/solr/Suggester
> > >
> > > 
> > >   
> > > mySuggester
> > > FuzzyLookupFactory
> > > DocumentDictionaryFactory
> > > cat
> > > price
> > > string
> > > false
> > >   
> > > 
> > >
> > > Can someone confirm this?
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > > On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti <
> > > abenede...@apache.org>
> > > wrote:
> > >
> > > > Hi Salman,
> > > > I agree with Alan.
> > > > Just configure your schema with the proper analysers .
> > > > For the field you want to use for suggestions you are likely to need
> > > simply
> > > > this fieldType :
> > > >
> > > >  > > > positionIncrementGap="100">
> > > > 
> > > > 
> > > > 
> > > >  minGramSize="1"
> > > > maxGramSize="20"/>
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >
> > > > This is a very sample example, please adapt it to your use case.
> > > >
> > > > Cheers
> > > >
> > > > On 2 December 2015 at 09:41, Alan Woodward  wrote:
> > > >
> > > > > Hi Salman,
> > > > >
> > > > > It sounds as though you want to do a normal search against a
> special
> > > > > 'suggest' field, that's been indexed with edge ngrams.
> > > > >
> > > > > Alan Woodward
> > > > > www.flax.co.uk
> > > > >
> > > > >
> > > > > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am looking for auto-complete in Solr but on top of just auto
> > > > complete I
> > > > > > want as well to return the data completely (not just
> suggestions),
> > > so I
> > > > > > want to get back the ids, and other fields in the whole
> document. I
> > > > tried
> > > > > > the following 2 approaches but each had issues
> > > > > >
> > > > > > 1) Used the /suggest component but that returns a very specific
> > > format
> > > > > > which looks like I cannot customize. I want to return the whole
> > > > document
> > > > > > that has a matching field and not only the suggestion list. So
> for
> > > > > example,
> > > > > > if I write "hard" it returns the results in a specific format as
> > > > follows
> > > > > >
> > > > > >   hard drive
> > > > > > hard disk
> > > > > >
> > > > > > Is there a way to get back additional fields with suggestions?
> > > > > >
> > > > > > 2) Tried the normal /select component but that does not do
> > > > auto-complete
> > > > > on
> > > > > > portion of the word. So, for example, if I write the query as
> > "bara"
> > > it
> > > > > > DOES NOT return "barack obama". Any suggestions how to solve
> this?
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Salman
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > --
> > > >
> > > > 

Highlighting large documents

2015-12-04 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.3.0

I found that in large documents, sometimes I face situation that when I do
a highlight query, the resultset that is returned does not contain the
highlighted query. There are actually matches in the documents, but just
that they located further back in the documents.

I have tried to increase the value of the hl.maxAnalyzedChars, as the
default value is 51200, and I have documents that are much larger than
51200 characters. Although this method works, but, when I increase this
value, the performance of the search and highlight drops. It can drop from
less than 0.5 seconds to more than 10 seconds.

Would like to check, is this method of increasing the value of the
hl.maxAnalyzedChars the best method to use, or is there other ways which
can solve the same purpose, but without affecting the performance much?

Regards,
Edwin


Re: Solrcloud - adding a node as a replica?

2015-12-04 Thread Mugeesh Husain
Hi, kamaci

I have 3 server, solr1, solr2 and solr3

I want to create 3 core in server solr1,and solr2 Not solr 3 in any
core/collection.

I want to create 3 replica wrt to above core, replica would be in server
solr3.

When i create one core using bin/solr create -c abc -shard 1 -replicaFactor
1, it is creating core on server 3 not server 2 or 3.

I want to create replica in following way.
core 1 --- shard1 -replica server(solr1)
  -replica server(solr3)


core 2 --- shard1 -replica server(solr2)
  -replica server(solr3)


core 3--- shard1 -replica server(solr1)
  -replica server(solr3)







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-adding-a-node-as-a-replica-tp4090890p4243576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread bengates
Hello,

I'm having usage issues with *Solrcloud*.

What I want to do:
- Manage a solr server *only with the API* (create / reload / delete
collections, create / replace / delete fields, etc).
- A new collection should* start with pre-defined default fields, fieldTypes
and copyFields* (let's say, field1 and field2 for fields).
- Each collection must *have its own schema*.

What I've setup yet:
- Installed a *Solr 5.3.1* in //opt/solr/ on an Ubuntu 14.04 server
- Installed *Zookeeper 3.4.6* in //opt/zookeeper/ as described in the solr
wiki
- Added line "server.1=127.0.0.1:2888:3888" in //opt/zookeeper/conf/zoo.cfg/
- Added line "127.0.0.1:2181" in
//var/solr/data/solr.xml/
- Told solr or zookeeper somewhere (don't remember where I setup this) to
use //home/me/configSet/managed-schema.xml/ and
//home/me/configSet/solrconfig.xml/ for configSet
- Run solr on port 8983

My //home/me/configSet/managed-schema.xml/ contains *field1* and *field2*.

Now let's create a collection:
http://my.remote.addr:8983/solr/admin/collections?action=CREATE=collection1=1
- *collection1 *is created, with *field1 *and *field2*. Perfect.

Let's create another collection:
http://my.remote.addr:8983/solr/admin/collections?action=CREATE=collection2=1
- *collection2 *is created, with *field1 *and *field2*. Perfect.

No, if I *add some fields* on *collection1 *by POSTing to :
/http://my.remote.addr:8983/solr/collection1/schema/ the following:


- *field3 *and *field4 *are successfully added to *collection1*
- ... but they are *also added* to *collection2* (verified by GETting
/http://my.remote.addr:8983/solr/collection2/schema/fields/)

How to prevent this behavior, since my collections have *different kind of
datas*, and may have the same field names but not the same types?

Thanks,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584.html
Sent from the Solr - User mailing list archive at Nabble.com.


"OnException" extension to SearchComponents / finalize on search components

2015-12-04 Thread deansg
I was recently writing a SearchComponent that performs a certain action in
the prepare method when in distributed context, and then must perform
another action after the query finished running to clean up (one of the
finishStage calls). I realized that if the SearchHandler on the server
throws an exception (because of another SearchComponent, for example) before
reaching finishStage, then I am in a problem since my component didnd't get
to clean-up after itself. I thought of two solutions to the problem and I
would like to know if they're any good:

1. I would write a class extending SearchComponent that has an "onException"
type of method. Then, I would write another class extending SearchHandler
that would surround handleRequestBody with a try-catch block and call
onException on the special SearchComponents, giving my component another
chance to clean up. This solution works for me, but extending a class such
as SearchHandler is kind of going against the regular Solr community.

2. I can put my cleaning-up logic in some finalizable object, and attach
that object to the ResponseBuilder. Then, I know that my component will
clean-up after itself eventually. However, I don't fully like this solution
either, first because attaching an object to the response builder for this
reason is abuse, and also because I know that finalize() should always be
avoided as much as possible.

What do you guys think?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OnException-extension-to-SearchComponents-finalize-on-search-components-tp4243575.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-04 Thread Mugeesh Husain
Hello Erick,

I did shutdown all node in solrcloud and copy data directory from non
solcoud, after that i start all of node, but new  data is not refleted.


One doubt 
>> Am i need to shutdown zookeeper node also ? or  also clean
data from zk directory 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/migrate-Copy-data-from-one-core1-server1-to-another-core2-server2-tp4243159p4243582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reloading the collection timed out

2015-12-04 Thread Troy Edwards
After running Solr on a linux box for about 15 days; today when I tried to
reload collections I got the following error


reload the collection time out:180s

org.apache.solr.common.SolrException: reload the collection time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:740)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:692)
at
org.apache.solr.handler.admin.CollectionsHandler.handleReloadAction(CollectionsHandler.java:762)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:783)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:282)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Any suggestions on how to do about this issue?

thanks


RE: Use multiple istance simultaneously

2015-12-04 Thread Gian Maria Ricci - aka Alkampfer
Many thanks for your response.

I worked with Solr until early version 4.0, then switched to ElasticSearch
for a variety of reasons. I've used replication in the past with SolR, but
with Elasticsearch basically I had no problem because it works similar to
SolrCloud by default and with almost zero configuration.

Now I've a customer that want to use Solr, and he want the simplest possible
stuff to maintain in production. Since most of the work will be done by Data
Import Handler, having multiple parallel and independent machines is easy to
maintain. If one machine fails, it is enough to configure another machine,
configure core and restart DIH.

I'd like to know if other people went through this path in the past.

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: giovedì 3 dicembre 2015 10:15
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

On 12/3/2015 1:25 AM, Gian Maria Ricci - aka Alkampfer wrote:
> In such a scenario could it be feasible to simply configure 2 or 3 
> identical instance of Solr and configure the application that transfer 
> data to solr to all the instances simultaneously (the approach will be 
> a DIH incremental for some core and an external application that push 
> data continuously for other cores)? Which could be the drawback of 
> using this approach?

When I first set up Solr, I used replication.  Then version 3.1.0 was
released, including a non-backward-compatible upgrade to javabin, and it was
not possible to replicate between 1.x and 3.x.

This incompatibility meant that it would not be possible to do a gradual
upgrade to 3.x, where the slaves are upgraded first and then the master.

To get around the problem, I basically did exactly wh at you've described.
I turned off replication and configured a second copy of my build program to
update what used to be slave servers.

Later, when I moved to a SolrJ program for index maintenance, I made one
copy of the maintenance program capable of updating multiple copies of the
index in parallel.

I have stuck with this architecture through 4.x and moving into 5.x, even
though I could go back to replication or switch to SolrCloud.
Having completely independent indexes allows a great deal of flexibility
with upgrades and testing new configurations, flexibility that isn't
available with SolrCloud or master-slave replication.

Thanks,
Shawn



Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread Jeff Wartes

If you want two different collections to have two different schemas, those
collections need to reference two different configsets.
So you need another copy of your config available using a different name,
and to reference that other name when you create the second collection.


On 12/4/15, 6:26 AM, "bengates"  wrote:

>Hello,
>
>I'm having usage issues with *Solrcloud*.
>
>What I want to do:
>- Manage a solr server *only with the API* (create / reload / delete
>collections, create / replace / delete fields, etc).
>- A new collection should* start with pre-defined default fields,
>fieldTypes
>and copyFields* (let's say, field1 and field2 for fields).
>- Each collection must *have its own schema*.
>
>What I've setup yet:
>- Installed a *Solr 5.3.1* in //opt/solr/ on an Ubuntu 14.04 server
>- Installed *Zookeeper 3.4.6* in //opt/zookeeper/ as described in the solr
>wiki
>- Added line "server.1=127.0.0.1:2888:3888" in
>//opt/zookeeper/conf/zoo.cfg/
>- Added line "127.0.0.1:2181" in
>//var/solr/data/solr.xml/
>- Told solr or zookeeper somewhere (don't remember where I setup this) to
>use //home/me/configSet/managed-schema.xml/ and
>//home/me/configSet/solrconfig.xml/ for configSet
>- Run solr on port 8983
>
>My //home/me/configSet/managed-schema.xml/ contains *field1* and *field2*.
>
>Now let's create a collection:
>http://my.remote.addr:8983/solr/admin/collections?action=CREATE=colle
>ction1=1
>- *collection1 *is created, with *field1 *and *field2*. Perfect.
>
>Let's create another collection:
>http://my.remote.addr:8983/solr/admin/collections?action=CREATE=colle
>ction2=1
>- *collection2 *is created, with *field1 *and *field2*. Perfect.
>
>No, if I *add some fields* on *collection1 *by POSTing to :
>/http://my.remote.addr:8983/solr/collection1/schema/ the following:
>
>
>- *field3 *and *field4 *are successfully added to *collection1*
>- ... but they are *also added* to *collection2* (verified by GETting
>/http://my.remote.addr:8983/solr/collection2/schema/fields/)
>
>How to prevent this behavior, since my collections have *different kind of
>datas*, and may have the same field names but not the same types?
>
>Thanks,
>Ben
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple
>-collections-multiple-schemas-tp4243584.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Highlighting large documents

2015-12-04 Thread Andrea Gazzarini
No no, sorry, the project is not yet started so I didn't experience your
issue, but I'll be a careful listener of this thread

Best,
Andrea

2015-12-04 17:04 GMT+01:00 Zheng Lin Edwin Yeo :

> Hi Andrea,
>
> I'm using the original highlighter.
>
> Below is my configuration for the highlighter in solrconfig.xml
>
>   
>
>explicit
>10
>json
>true
>   text
>   id, title, content_type, last_modified, url, score 
>
>   on
>id, title, content, author 
>   true
>true
>html
>   200
>   100
>
> true
> signature
> true
> 100
>   
>   
>
>
> Have you managed to solve the problem?
>
> Regards,
> Edwin
>
>
> On 4 December 2015 at 23:54, Andrea Gazzarini 
> wrote:
>
> > Hi Zheng,
> > just curiousity, because shortly I will have to deal with a similar
> > scenario (Solr 5.3.1 + large documents + highlighting).
> > Which highlighter are you using?
> >
> > Andrea
> >
> > 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> > > Hi,
> > >
> > > I'm using Solr 5.3.0
> > >
> > > I found that in large documents, sometimes I face situation that when I
> > do
> > > a highlight query, the resultset that is returned does not contain the
> > > highlighted query. There are actually matches in the documents, but
> just
> > > that they located further back in the documents.
> > >
> > > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > > default value is 51200, and I have documents that are much larger than
> > > 51200 characters. Although this method works, but, when I increase this
> > > value, the performance of the search and highlight drops. It can drop
> > from
> > > less than 0.5 seconds to more than 10 seconds.
> > >
> > > Would like to check, is this method of increasing the value of the
> > > hl.maxAnalyzedChars the best method to use, or is there other ways
> which
> > > can solve the same purpose, but without affecting the performance much?
> > >
> > > Regards,
> > > Edwin
> > >
> >
>


Re: schema fileds and Typefield in solr-5.3.1

2015-12-04 Thread kostali hassan
thank you Erick, i follow you advice and take a look to config apache tika,
I have modifie my request handler /update/extract:



  last_modified
  ignored_

  
  true
  links
  ignored_

D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
  

and config tika :

dataConfig>







   












and schema.xml:





but the prb is the same title of indexed files is wrong for msword


Re: schema fileds and Typefield in solr-5.3.1

2015-12-04 Thread Erik Hatcher
Kostali -

See if the "Introspect rich document parsing and extraction” section of 
http://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/ helps*.  
You’ll be able to see the output of /update/extract (aka Tika) and adjust your 
mappings and configurations accordingly.

* And apologies that bin/post isn’t Windows savvy at this point, but you’ve got 
the hang of the Windows-compatible command-line it looks like.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com



> On Dec 4, 2015, at 11:44 AM, kostali hassan  wrote:
> 
> thank you Erick, i follow you advice and take a look to config apache tika,
> I have modifie my request handler /update/extract:
> 
>   startup="lazy"
>  class="solr.extraction.ExtractingRequestHandler" >
>
>  last_modified
>  ignored_
> 
>  
>  true
>  links
>  ignored_
>
>  name="tika.config">D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
>  
> 
> and config tika :
> 
> dataConfig>
>
>
> dataSource="null" rootEntity="false"
>baseDir="D:\Lucene\document"
> fileName=".*.(doc)|(pdf)|(docx)"
> onError="skip"
>recursive="true">
>
>
>
> 
>   name="documentImport"
>processor="TikaEntityProcessor"
>url="${files.fileAbsolutePath}"
>format="text">
>
>
>
> 
>
> meta="true"/>
> meta="true"/>
>
>
>
> 
> 
> and schema.xml:
> 
> 
> 
> 
> 
> but the prb is the same title of indexed files is wrong for msword



Re: schema fileds and Typefield in solr-5.3.1

2015-12-04 Thread kostali hassan
thank you , that's why I choose to add the exact value using solarium PHP
Client, but the time out stop indexing after 30seconde:

$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);

$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();

$query->setDocument($doc);

2015-12-04 16:50 GMT+00:00 Erik Hatcher :

> Kostali -
>
> See if the "Introspect rich document parsing and extraction” section of
> http://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/
> helps*.  You’ll be able to see the output of /update/extract (aka Tika) and
> adjust your mappings and configurations accordingly.
>
> * And apologies that bin/post isn’t Windows savvy at this point, but
> you’ve got the hang of the Windows-compatible command-line it looks like.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
> > On Dec 4, 2015, at 11:44 AM, kostali hassan 
> wrote:
> >
> > thank you Erick, i follow you advice and take a look to config apache
> tika,
> > I have modifie my request handler /update/extract:
> >
> >  >  startup="lazy"
> >  class="solr.extraction.ExtractingRequestHandler" >
> >
> >  last_modified
> >  ignored_
> >
> >  
> >  true
> >  links
> >  ignored_
> >
> >  >
> name="tika.config">D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
> >  
> >
> > and config tika :
> >
> > dataConfig>
> >
> >
> > > dataSource="null" rootEntity="false"
> >baseDir="D:\Lucene\document"
> > fileName=".*.(doc)|(pdf)|(docx)"
> > onError="skip"
> >recursive="true">
> >
> >
> >
> >
> >>name="documentImport"
> >processor="TikaEntityProcessor"
> >url="${files.fileAbsolutePath}"
> >format="text">
> >
> >
> >
> > 
> >
> > > meta="true"/>
> > > meta="true"/>
> >
> >
> >
> > 
> >
> > and schema.xml:
> >
> > 
> >
> >
> >
> > but the prb is the same title of indexed files is wrong for msword
>
>


Re: migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-04 Thread Erick Erickson
Really, this is probably not a great path to go down. If you are saying you
want a leader and follower situation, you should just copy to the leader,
bring it up then use the ADDREPLICA command to add the replica.

That said, I'd seriously consider just defining a new cluster in your SolrCloud
setup the way you want it, then re-index

Best,
Erick

On Fri, Dec 4, 2015 at 6:01 AM, Mugeesh Husain  wrote:
> Hello Erick,
>
> I did shutdown all node in solrcloud and copy data directory from non
> solcoud, after that i start all of node, but new  data is not refleted.
>
>
> One doubt
> >> Am i need to shutdown zookeeper node also ? or  also clean
> data from zk directory
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/migrate-Copy-data-from-one-core1-server1-to-another-core2-server2-tp4243159p4243582.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Dyer, James
Brian,

Be sure to have...

transformer="RegexTransformer"

...in your  tag.  It’s the RegexTransformer class that looks for 
"splitBy".

See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more 
information.

James Dyer
Ingram Content Group


-Original Message-
From: Brian Narsi [mailto:bnars...@gmail.com] 
Sent: Friday, December 04, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler - Multivalued fields - splitBy

I have the following:





I believe I had the following working (splitting on pipe delimited)



But it does not work now.



In-fact now I have even tried



But I cannot get the values to split into an array.

Any thoughts/suggestions what may be wrong?

Thanks,


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Erick Erickson
Actually, I rather agree with your colleagues, but then I'm something
of a curmudgeon.

More accurately, unless you _strictly_ control the input documents,
you never know what you have in your index. I'd rather have docs fail
indexing than be indexed with, say, typos in the field names

FWIW,
Erick

On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir  wrote:
> On Fri, Dec 4, 2015 at 12:59 AM, 
> wrote:
>
>>
>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>> >Managed Schema going forward.
>> >
>
>
> We are using loosely typed languages (Perl and Javascript), and a loosely
> typed DB (CouchDB). This is consistent with running Solr in Schemaless
> mode, and doing more unit tests. When you post a doc into Solr containing a
> field which has not been seen before, Solr chooses the most appropriate
> Type. There is no Java exception and the field data is searchable. You can
> discover the Type by looking at the Solr console. We can probably log it
> too.
>
> The new field might be due to us intentionally adding it, though we should
> be methodical and systematic about adding new fields.
>
> Or it could be due to unexpected input to the ingest scripts, (but I
> believe these scripts should clean their inputs).
>
> Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
> the ingest scripts should have tests so we can claim they are bug free.
>
>
> However, I brought up this topic with my colleagues here, and they are sure
> we should stick with Schema.xml. ".. some level of control and expectation
> of exactly what kind of data is in our search system wouldn't be helpful
> .." So be it.
> Cheers -- Rick


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Alexandre Rafalovitch
Not that hard to setup a cron and diff job and email when the diff is
not-empty. A sort-of "is that what you expected" report.

But, for myself, I also prefer schema and then managed. I do not like
schemaless mode, even for development. Instead, I prefer to do
"dynamicField *".

P.s. I am thinking of doing a video/webinar show-casing the RAD method
based on the dynamicField *, as I see many people really do not get
the workflow around it. If that's something people are interested in,
let me know directly and/or subscribe to the newsletter at
http://www.solr-start.com/ for an announcement. I'll treat the
subscriptions over the next 24 hours as a vote :-)


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 December 2015 at 15:15, Erick Erickson  wrote:
> Actually, I rather agree with your colleagues, but then I'm something
> of a curmudgeon.
>
> More accurately, unless you _strictly_ control the input documents,
> you never know what you have in your index. I'd rather have docs fail
> indexing than be indexed with, say, typos in the field names
>
> FWIW,
> Erick
>
> On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir  wrote:
>> On Fri, Dec 4, 2015 at 12:59 AM, 
>> wrote:
>>
>>>
>>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>>> >Managed Schema going forward.
>>> >
>>
>>
>> We are using loosely typed languages (Perl and Javascript), and a loosely
>> typed DB (CouchDB). This is consistent with running Solr in Schemaless
>> mode, and doing more unit tests. When you post a doc into Solr containing a
>> field which has not been seen before, Solr chooses the most appropriate
>> Type. There is no Java exception and the field data is searchable. You can
>> discover the Type by looking at the Solr console. We can probably log it
>> too.
>>
>> The new field might be due to us intentionally adding it, though we should
>> be methodical and systematic about adding new fields.
>>
>> Or it could be due to unexpected input to the ingest scripts, (but I
>> believe these scripts should clean their inputs).
>>
>> Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
>> the ingest scripts should have tests so we can claim they are bug free.
>>
>>
>> However, I brought up this topic with my colleagues here, and they are sure
>> we should stick with Schema.xml. ".. some level of control and expectation
>> of exactly what kind of data is in our search system wouldn't be helpful
>> .." So be it.
>> Cheers -- Rick


RE: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Davis, Daniel (NIH/NLM) [C]
So, I actually went to an Elastic Search one day conference.   One person spoke 
about having to re-index everything because they had their field mappings 
wrong.   I've also worked on Linked Data, RDF, where the fact that everything 
is a triple is supposed to make SQL schemas unneeded.

The theme with Elastic Search was:
 - spend some time on your field mappings (which are a schema) up front.
 - if you don't, you are either going to be wasting space, or experiencing slow 
search, or both.

The theme with RDF was:
 - First model your vocabulary and make sure it answers the questions you want 
to answer.

So, we can be "schemaless", but with both Linked Data and ES, it is a way to 
get started quickly - there are still advantages to using a schema.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 04, 2015 3:16 PM
To: solr-user 
Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Actually, I rather agree with your colleagues, but then I'm something of a 
curmudgeon.

More accurately, unless you _strictly_ control the input documents, you never 
know what you have in your index. I'd rather have docs fail indexing than be 
indexed with, say, typos in the field names

FWIW,
Erick

On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir  wrote:
> On Fri, Dec 4, 2015 at 12:59 AM, 
> 
> wrote:
>
>>
>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>> >Managed Schema going forward.
>> >
>
>
> We are using loosely typed languages (Perl and Javascript), and a 
> loosely typed DB (CouchDB). This is consistent with running Solr in 
> Schemaless mode, and doing more unit tests. When you post a doc into 
> Solr containing a field which has not been seen before, Solr chooses 
> the most appropriate Type. There is no Java exception and the field 
> data is searchable. You can discover the Type by looking at the Solr 
> console. We can probably log it too.
>
> The new field might be due to us intentionally adding it, though we 
> should be methodical and systematic about adding new fields.
>
> Or it could be due to unexpected input to the ingest scripts, (but I 
> believe these scripts should clean their inputs).
>
> Or it could be due to a bug in the ingest scripts. In the spirit of 
> TDD, the ingest scripts should have tests so we can claim they are bug free.
>
>
> However, I brought up this topic with my colleagues here, and they are 
> sure we should stick with Schema.xml. ".. some level of control and 
> expectation of exactly what kind of data is in our search system 
> wouldn't be helpful .." So be it.
> Cheers -- Rick


Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Brian Narsi
I have the following:





I believe I had the following working (splitting on pipe delimited)



But it does not work now.



In-fact now I have even tried



But I cannot get the values to split into an array.

Any thoughts/suggestions what may be wrong?

Thanks,


Indexing Wikipedia

2015-12-04 Thread Kate Kas
Hi,

i tried to index .xml files from wikipedia articles (
https://dumps.wikimedia.org/enwiki/20150702/) using the method, which is
proposed by solr tutorial (
https://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia).

I think that some fields are not indexed, because when i use q equal to *:*
and fl equal to * (
http://localhost:8983/solr/wikipedia/select?q=*%3A*=*=json=true)
, i receive results only for "id" and "_version_"  .

Any idea which could be the problem?

Thank you in advance.

Best,
Kate


Re: Using properties placeholder ${someProperty} for xml node attribute in solrconfig

2015-12-04 Thread Pushkar Raste
Thanks Erick, I verified that we can use properties placeholders for
attributes on a xml node. One last question. I was reading through
CommitTracker and looks like setting maxTime for 'autoCommit' or '
autoSoftCommit' will disable commits. Is my understanding right?

On 3 December 2015 at 15:40, Erick Erickson  wrote:

> Hmmm, never tried it. You can check by looking at the admin
> UI>>plugins/stats>>cahces>>filterCache with a property defined like
> you want.
>
> And assuming that works, yes. the filterCache is turned off if its size is
> zero.
>
> Another option might be to add {!cache=false} to your fq clauses on
> the client in this case if that is possible/convenient.
>
> Best,
> Erick
>
> On Thu, Dec 3, 2015 at 11:19 AM, Pushkar Raste 
> wrote:
> > Hi,
> > I want to make turning filter cache on/off configurable (I really have a
> > use case to turn off filter cache), can I use properties placeholders
> like
> > ${someProperty} in the filter cache config. i.e.
> >
> >  >  size="${solr.filterCacheSize:4096}"
> >  initialSize=""${solr.filterCacheInitialSize:2048}"
> >  autowarmCount="0"/>
> >
> > In short, can I use properties placeholders for attributes for xml node
> in
> > solrconfig. Follow up question is, provided I can do that, to turn off
> > filterCache can I simply set values 0 (zero) for 'solr.filterCacheSize'
> and
> > 'solr.filterCacheInitialSize'
>


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Upayavira
This is exactly right. Schemaless can be a great discovery tool, but not
something it is useful to use in production, I'd say.

On Fri, Dec 4, 2015, at 08:21 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> So, I actually went to an Elastic Search one day conference.   One person
> spoke about having to re-index everything because they had their field
> mappings wrong.   I've also worked on Linked Data, RDF, where the fact
> that everything is a triple is supposed to make SQL schemas unneeded.
> 
> The theme with Elastic Search was:
>  - spend some time on your field mappings (which are a schema) up front.
>  - if you don't, you are either going to be wasting space, or
>  experiencing slow search, or both.
> 
> The theme with RDF was:
>  - First model your vocabulary and make sure it answers the questions you
>  want to answer.
> 
> So, we can be "schemaless", but with both Linked Data and ES, it is a way
> to get started quickly - there are still advantages to using a schema.
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Friday, December 04, 2015 3:16 PM
> To: solr-user 
> Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?
> 
> Actually, I rather agree with your colleagues, but then I'm something of
> a curmudgeon.
> 
> More accurately, unless you _strictly_ control the input documents, you
> never know what you have in your index. I'd rather have docs fail
> indexing than be indexed with, say, typos in the field names
> 
> FWIW,
> Erick
> 
> On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir 
> wrote:
> > On Fri, Dec 4, 2015 at 12:59 AM, 
> > 
> > wrote:
> >
> >>
> >> >Just wondering if folks have any suggestions on using Schema.xml vs.
> >> >Managed Schema going forward.
> >> >
> >
> >
> > We are using loosely typed languages (Perl and Javascript), and a 
> > loosely typed DB (CouchDB). This is consistent with running Solr in 
> > Schemaless mode, and doing more unit tests. When you post a doc into 
> > Solr containing a field which has not been seen before, Solr chooses 
> > the most appropriate Type. There is no Java exception and the field 
> > data is searchable. You can discover the Type by looking at the Solr 
> > console. We can probably log it too.
> >
> > The new field might be due to us intentionally adding it, though we 
> > should be methodical and systematic about adding new fields.
> >
> > Or it could be due to unexpected input to the ingest scripts, (but I 
> > believe these scripts should clean their inputs).
> >
> > Or it could be due to a bug in the ingest scripts. In the spirit of 
> > TDD, the ingest scripts should have tests so we can claim they are bug free.
> >
> >
> > However, I brought up this topic with my colleagues here, and they are 
> > sure we should stick with Schema.xml. ".. some level of control and 
> > expectation of exactly what kind of data is in our search system 
> > wouldn't be helpful .." So be it.
> > Cheers -- Rick


Authorization API versus zkcli.sh

2015-12-04 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Looking through 
cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
 one notices that security.json is initially created by zkcli.sh, and then 
modified by means of the Authentication API and the Authorization API. By and 
large, this sounds like a good way to accomplish such tasks, assuming that 
these APIs do some error checking to prevent corruption of security.json

I was wondering about cases where one is cloning an existing Solr instance, 
such as when creating an instance in Amazon Cloud. If one has a security.json 
that has been thoroughly tried and successfully tested on another Solr 
instance, is it possible / safe / not-un-recommended to use zkcli.sh to load 
the full security.json (as extracted via zkcli.sh from the Zookeeper of the 
thoroughly tested existing instance)? Or would the official verdict be that the 
only acceptable way to create security.json is to load a minimal version with 
zkcli.sh and then to build the remaining components with the Authentication API 
and the Authorization API (in a script, if one wants to automate the process: 
although such a script would have to include plain-text passwords)?

I figured there is no harm in asking.


Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Chris Hostetter

: You could add 'enable' flag in the solrconfig.xml and then
: enable/disable it differently on different servers:

Off the top of my head, i'm not certain if enable="false" on a "/update" 
handler will actaully do what the user wants -- it might prevent a handler 
from existing at that path; but i *think* the logic that creates implicit 
UpdateHandlers may still cause an implicit /update handler to be created
in that case? not certain.

The NotFoundRequestHandler on the other hand, was created explicitly for 
the purpose of being able to register it to paths where you just want to 
reject requests and over ride any implicit handler that might otherwise 
exist...






-Hoss
http://www.lucidworks.com/


Max indexing threads & RamBuffered size

2015-12-04 Thread KNitin
Hi,

The max indexing threads in the solrconfig.xml is set to 8 by default. Does
this mean only 8 concurrent indexing threads will be allowed per collection
level? or per core level?

Buffered size : This seems to be set at 64Mb. If we have beefier machine
that can take more load, can we set this to a higher limit say 1 or 2 Gb?
What will be downside of doing so? (apart from commits taking longer).

Thanks in advance!
Nitin


Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Alexandre Rafalovitch
On 4 December 2015 at 19:23, Chris Hostetter  wrote:
> NotFoundRequestHandler

Totally not in either Wiki or Reference Guide. :-(

Must be part of the secret committer's lore. Thank you for sharing it
with us, pure plebs :-)


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


Re: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Brian Narsi
That was it! Thank you!

On Fri, Dec 4, 2015 at 3:13 PM, Dyer, James 
wrote:

> Brian,
>
> Be sure to have...
>
> transformer="RegexTransformer"
>
> ...in your  tag.  It’s the RegexTransformer class that looks
> for "splitBy".
>
> See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for
> more information.
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Brian Narsi [mailto:bnars...@gmail.com]
> Sent: Friday, December 04, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Data Import Handler - Multivalued fields - splitBy
>
> I have the following:
>
>  required="true" multiValued="true" />
>
>
>
> I believe I had the following working (splitting on pipe delimited)
>
> 
>
> But it does not work now.
>
>
>
> In-fact now I have even tried
>
> 
>
> But I cannot get the values to split into an array.
>
> Any thoughts/suggestions what may be wrong?
>
> Thanks,
>


Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Jack Krupansky
Never made it into CHANGES.txt either. Not part of any patch either.
Appears to have been secretly committed as a part of SOLR-6787 (Blob API) via
Revision *1650448
* in Solr 5.1.


-- Jack Krupansky

On Fri, Dec 4, 2015 at 10:54 PM, Alexandre Rafalovitch 
wrote:

> On 4 December 2015 at 19:23, Chris Hostetter 
> wrote:
> > NotFoundRequestHandler
>
> Totally not in either Wiki or Reference Guide. :-(
>
> Must be part of the secret committer's lore. Thank you for sharing it
> with us, pure plebs :-)
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>


Re: Indexing Wikipedia

2015-12-04 Thread Paul Libbrecht
SImply... some fields are not stored so they are only searched through
(being indexed) but not given back?
(title and text in the tutorial you refer to). Are these the missing fields?

Paul
> Kate Kas 
> 5 décembre 2015 00:23
> Hi,
>
> i tried to index .xml files from wikipedia articles (
> https://dumps.wikimedia.org/enwiki/20150702/) using the method, which is
> proposed by solr tutorial (
> https://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia).
>
> I think that some fields are not indexed, because when i use q equal
> to *:*
> and fl equal to * (
> http://localhost:8983/solr/wikipedia/select?q=*%3A*=*=json=true)
> , i receive results only for "id" and "_version_" .
>
> Any idea which could be the problem?
>
> Thank you in advance.
>
> Best,
> Kate
>