Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Mike Thomsen
Thanks. I think I'll take a look at that. I decided to just build a big
vagrant-managed desktop VM to let me run Ubuntu on my company machine, so I
expect that this pain point may be largely gone soon.

On Mon, Apr 10, 2017 at 12:31 PM, Vincenzo D'Amore 
wrote:

> Hi Mike
>
> disclaimer I'm the author of https://github.com/freedev/
> solrcloud-zookeeper-docker
>
> I had same problem when I tried to create a cluster SolrCloud with docker,
> just because the docker instances were referred by ip addresses I cannot
> access with SolrJ.
>
> I avoided this problem referring each docker instance via a hostname
> instead of ip address.
>
> Docker-compose is a great help to have a network where your docker
> instances can be resolved using their names.
>
> I'll suggest to take a look at my project, in particular at the
> docker-compose.yml used to start a SolrCloud cluster (3 Solr nodes with a
> zookeeper ensemble of 3):
>
> https://raw.githubusercontent.com/freedev/solrcloud-
> zookeeper-docker/master/
> solrcloud-3-nodes-zookeeper-ensemble/docker-compose.yml
>
> Ok, I know, it sounds too much create a SolrCloud into a single VM, I did
> it just to understand how Solr works... :)
>
> Once you've build your SolrCloud Docker network, you can map the name of
> your docker instances externally, for example in your private network or in
> your hosts file.
>
> In other words, given a Docker Solr instance named solr-1, in the docker
> network the instance named solr-1 has a docker ip address that cannot be
> used outside the VM.
>
> So when you use SolrJ client on your computer you must have into /etc/hosts
> an entry solr-1 that points to the ip address your VM (the public network
> interface where the docker instance is mapped).
>
> Hope you understand... :)
>
> Cheers,
> Vincenzo
>
>
> On Sun, Apr 9, 2017 at 2:42 AM, Mike Thomsen 
> wrote:
>
> > I'm running two nodes of SolrCloud in Docker on Windows using Docker
> > Toolbox.  The problem I am having is that Docker Toolbox runs inside of a
> > VM and so it has an internal network inside the VM that is not accessible
> > to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
> > 192.168.99.100, I can load the admin UI and do basic operations that are
> > written to go against that IP and port (like querying, schema editor,
> > manually adding documents, etc.)
> >
> > However, when I try to run code that uses SolrJ to add documents, it
> fails
> > because the ZK configuration has the IPs for the internal Docker network
> > which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
> > from there, it works just fine. From the host OS, doesn't.
> >
> > Anyone have any ideas on how to get around this? If I rewrite the
> indexing
> > code to do a manual JSON POST to the update handler on one of the nodes,
> it
> > does work just fine, but that leaves me not using SolrJ.
> >
> > Thanks,
> >
> > Mike
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251 <349%20851%203251>
>


Re: SolrJ appears to have problems with Docker Toolbox

2017-04-08 Thread Mike Thomsen
Hi Rick,

No, I just used the "official" one on Docker Hub (
https://hub.docker.com/_/solr/) and followed the instructions for linking
and working with ZooKeeper to get SolrCloud up and running.

I may have to go to the Docker forum in the end, but I thought I'd ask here
first since the only thing that seems to be broken is the Java client API,
not the servers, in this environment/configuration.

Thanks,

Mike

On Sat, Apr 8, 2017 at 9:41 PM, Rick Leir  wrote:

> Hi Mike
> Did you dockerize Solr yourself? I have some knowledge of Docker, and
> think that this question would get better help in a Docker forum.
> Cheers -- Rick
>
> On April 8, 2017 8:42:13 PM EDT, Mike Thomsen 
> wrote:
> >I'm running two nodes of SolrCloud in Docker on Windows using Docker
> >Toolbox.  The problem I am having is that Docker Toolbox runs inside of
> >a
> >VM and so it has an internal network inside the VM that is not
> >accessible
> >to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
> >192.168.99.100, I can load the admin UI and do basic operations that
> >are
> >written to go against that IP and port (like querying, schema editor,
> >manually adding documents, etc.)
> >
> >However, when I try to run code that uses SolrJ to add documents, it
> >fails
> >because the ZK configuration has the IPs for the internal Docker
> >network
> >which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
> >from there, it works just fine. From the host OS, doesn't.
> >
> >Anyone have any ideas on how to get around this? If I rewrite the
> >indexing
> >code to do a manual JSON POST to the update handler on one of the
> >nodes, it
> >does work just fine, but that leaves me not using SolrJ.
> >
> >Thanks,
> >
> >Mike
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


SolrJ appears to have problems with Docker Toolbox

2017-04-08 Thread Mike Thomsen
I'm running two nodes of SolrCloud in Docker on Windows using Docker
Toolbox.  The problem I am having is that Docker Toolbox runs inside of a
VM and so it has an internal network inside the VM that is not accessible
to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
192.168.99.100, I can load the admin UI and do basic operations that are
written to go against that IP and port (like querying, schema editor,
manually adding documents, etc.)

However, when I try to run code that uses SolrJ to add documents, it fails
because the ZK configuration has the IPs for the internal Docker network
which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
from there, it works just fine. From the host OS, doesn't.

Anyone have any ideas on how to get around this? If I rewrite the indexing
code to do a manual JSON POST to the update handler on one of the nodes, it
does work just fine, but that leaves me not using SolrJ.

Thanks,

Mike


Re: Data Import

2017-03-17 Thread Mike Thomsen
If Solr is down, then adding through SolrJ would fail as well. Kafka's new
API has some great features for this sort of thing. The new client API is
designed to be run in a long-running loop where you poll for new messages
with a certain amount of defined timeout (ex: consumer.poll(1000) for 1s)
So if Solr becomes unstable or goes down, it's easy to have the consumer
just stop and either wait until Solr comes back up or save the data to
disk/commit the Kafka offsets to ZK and stop running.

On Fri, Mar 17, 2017 at 1:24 PM, OTH  wrote:

> Are Kafka and SQS interchangeable?  (The latter does not seem to be free.)
>
> @Wunder:
> I'm assuming, that updating to Solr would fail if Solr is unavailable not
> just if posting via say a DB trigger, but probably also if trying to post
> through SolrJ?  (Which is what I'm using for now.)  So, even if using
> SolrJ, it would be a good idea to use a queuing software?
>
> Thanks
>
> On Fri, Mar 17, 2017 at 10:12 PM, vishal jain  wrote:
>
> > Streaming the data through kafka would be a good option if near real time
> > data indexing is the key requirement.
> > In our application the RDBMS data is populated by an ETL job periodically
> > so we don't need real time data indexing for now.
> >
> > Cheers,
> > Vishal
> >
> > On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > Or set a trigger on your RDBMS's main table to put the relevant
> > > information in a different table (call it EVENTS) and have your SolrJ
> > > consult the EVENTS table periodically. Essentially you're using the
> > > EVENTS table as a queue where the trigger is the producer and the
> > > SolrJ program is the consumer.
> > >
> > > It's a polling solution though, so not event-driven. There's no
> > > mechanism that I know of have, say, your RDBMS push an event to DIH
> > > for instance.
> > >
> > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> > > for this kind of problem..
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
> > >  wrote:
> > > > One assumes by hooking into the same code that updates RDBMS, as
> > > > opposed to be reverse engineering the changes from looking at the DB
> > > > content. This would be especially the case for Delete changes.
> > > >
> > > > Regards,
> > > >Alex.
> > > > 
> > > > http://www.solr-start.com/ - Resources for Solr users, new and
> > > experienced
> > > >
> > > >
> > > > On 17 March 2017 at 11:37, OTH  wrote:
> > > >>>
> > > >>> Also, solrj is good when you want your RDBMS updates make
> immediately
> > > >>> available in solr.
> > > >>
> > > >> How can SolrJ be used to make RDBMS updates immediately available?
> > > >> Thanks
> > > >>
> > > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> > > sujaybawas...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Vishal,
> > > >>>
> > > >>> As per my experience DIH is the best for RDBMS to solr index. DIH
> > with
> > > >>> caching has best performance. DIH nested entities allow you to
> define
> > > >>> simple queries.
> > > >>> Also, solrj is good when you want your RDBMS updates make
> immediately
> > > >>> available in solr. DIH full import can be used for index all data
> > first
> > > >>> time or restore index in case index is corrupted.
> > > >>>
> > > >>> Thanks,
> > > >>> Sujay
> > > >>>
> > > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> > > wrote:
> > > >>>
> > > >>> > Hi,
> > > >>> >
> > > >>> >
> > > >>> > I am new to Solr and am trying to move data from my RDBMS to
> Solr.
> > I
> > > know
> > > >>> > the available options are:
> > > >>> > 1) Post Tool
> > > >>> > 2) DIH
> > > >>> > 3) SolrJ (as ours is a J2EE application).
> > > >>> >
> > > >>> > I want to know what is the recommended way for Data import in
> > > production
> > > >>> > environment.
> > > >>> > Will sending data via SolrJ in batches be faster than posting a
> csv
> > > using
> > > >>> > POST tool?
> > > >>> >
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Vishal
> > > >>> >
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks,
> > > >>> Sujay P Bawaskar
> > > >>> M:+91-77091 53669
> > > >>>
> > >
> >
>


Re: SOLR Data Locality

2017-03-17 Thread Mike Thomsen
I've only ever used the HDFS support with Cloudera's build, but my
experience turned me off to use HDFS. I'd much rather use the native file
system over HDFS.

On Tue, Mar 14, 2017 at 10:19 AM, Muhammad Imad Qureshi <
imadgr...@yahoo.com.invalid> wrote:

> We have a 30 node Hadoop cluster and each data node has a SOLR instance
> also running. Data is stored in HDFS. We are adding 10 nodes to the
> cluster. After adding nodes, we'll run HDFS balancer and also create SOLR
> replicas on new nodes. This will affect data locality. does this impact how
> solr works (I mean performance) if the data is on a remote node? ThanksImad
>


How to expose new Lucene field type to Solr

2017-03-02 Thread Mike Thomsen
Found this project and I'd like to know what would be involved with
exposing its RestrictedField type through Solr for indexing and querying as
a Solr field type.

https://github.com/roshanp/lucure-core

Thanks,

Mike


Re: solr warning - filling logs

2017-02-27 Thread Mike Thomsen
It's a brittle ZK configuration. A typical ZK quorum is three nodes for
most production systems. One is fine, though, for development provided the
system it's on is not overloaded.

On Mon, Feb 27, 2017 at 6:43 PM, Rick Leir  wrote:

> Hi Mike
> We are using a single ZK node, I think. What problems should we expect?
> Thanks -- Rick
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Index Segments not Merging

2017-02-27 Thread Mike Thomsen
Just barely skimmed the documentation, but it looks like the tool generates
its own shards and pushes them into the collection by manipulating the
configuration of the cluster.

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_mapreduceindexertool.html

If that reading is correct, it would stand to reason that Solr (at least as
of Solr 4.10 which is what CDH ships) would not be doing the periodic
cleanup it normally does when building shards through its APIs.

On Thu, Feb 23, 2017 at 10:01 PM, Jordan Drake 
wrote:

> We have solr with the index stored in HDFS. We are running MapReduce jobs
> to build the index using the MapReduceIndexerTool from Cloudera with the
> go-live option to merge into our live index.
>
> We are seeing an issue where the number of segments in the index never
> reduces. It continues to grow until we manually do an optimize.
>
> We are using the following solr config for merge policy
>
>
>
>
>
>
>
>
>
>
>
> * name="maxMergeAtOnce">10 name="segmentsPerTier">10 class="org.apache.lucene.index.ConcurrentMergeScheduler"> name="maxThreadCount">1 name="maxMergeCount">6*
>
> If we add documents into solr without using MapReduce the segments merge
> properly as expected.
>
> Any ideas on why we see this behavior? Does the solr index merge prevent
> the segments from merging?
>
>
> Thanks,
> Jordan
>


Re: solr warning - filling logs

2017-02-27 Thread Mike Thomsen
When you transition to an external zookeeper, you'll need at least 3 ZK
nodes. One is insufficient outside of a development environment. That's a
general requirement for any system that uses ZK.

On Sun, Feb 26, 2017 at 7:14 PM, Satya Marivada 
wrote:

> May I ask about the port scanner running? Can you please elaborate?
> Sure, will try to move out to external zookeeper
>
> On Sun, Feb 26, 2017 at 7:07 PM Dave  wrote:
>
> > You shouldn't use the embedded zookeeper with solr, it's just for
> > development not anywhere near worthy of being out in production.
> Otherwise
> > it looks like you may have a port scanner running. In any case don't use
> > the zk that comes with solr
> >
> > > On Feb 26, 2017, at 6:52 PM, Satya Marivada  >
> > wrote:
> > >
> > > Hi All,
> > >
> > > I have configured solr with SSL and enabled http authentication. It is
> > all
> > > working fine on the solr admin page, indexing and querying process. One
> > > bothering thing is that it is filling up logs every second saying no
> > > authority, I have configured host name, port and authentication
> > parameters
> > > right in all config files. Not sure, where is it coming from. Any
> > > suggestions, please. Really appreciate it. It is with sol-6.3.0 cloud
> > with
> > > embedded zookeeper. Could it be some bug with solr-6.3.0 or am I
> missing
> > > some configuration?
> > >
> > > 2017-02-26 23:32:43.660 WARN (qtp606548741-18) [c:plog s:shard1
> > > r:core_node2 x:plog_shard1_replica1] o.e.j.h.HttpParser parse
> exception:
> > > java.lang.IllegalArgumentException: No Authority for
> > > HttpChannelOverHttp@6dac689d{r=0,c=false,a=IDLE,uri=null}
> > > java.lang.IllegalArgumentException: No Authority
> > > at
> > >
> > org.eclipse.jetty.http.HostPortHttpField.(
> HostPortHttpField.java:43)
> > > at org.eclipse.jetty.http.HttpParser.parsedHeader(HttpParser.java:877)
> > > at org.eclipse.jetty.http.HttpParser.parseHeaders(
> HttpParser.java:1050)
> > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1266)
> > > at
> > >
> > org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(
> HttpConnection.java:344)
> > > at
> > >
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:227)
> > > at org.eclipse.jetty.io
> > > .AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > > at
> > org.eclipse.jetty.io.ssl.SslConnection.onFillable(
> SslConnection.java:186)
> > > at org.eclipse.jetty.io
> > > .AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > > at org.eclipse.jetty.io
> > > .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)
> > > at
> > >
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:156)
> > > at
> > >
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:654)
> > > at
> > >
> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)
> > > at java.lang.Thread.run(Thread.java:745)
> >
>


Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Mike Thomsen
Correct me if I'm wrong, but heavy use of doc values should actually blow
up the size of your index considerably if they are in fields that get sent
a lot of data.

On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel  wrote:

> Thanks for the reply. I can see that in solr 6, more than 50% of the index
> directory is occupied by ".nvd" file extension. It is something related to
> norms and doc values.
>
> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> wrote:
>
> > Did you look in the data directories to check what index file extensions
> > contribute most to the difference? That could give a hint.
> >
> > Regards,
> > Alex
> >
> > On 21 Feb 2017 9:47 AM, "Pratik Patel"  wrote:
> >
> > > Here is the same question in stackOverflow for better format.
> > >
> > > http://stackoverflow.com/questions/42370231/solr-
> > > dynamic-field-blowing-up-the-index-size
> > >
> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine
> > but
> > > the problem is that index size with solr 6 is way too large. In solr 5,
> > > index size was about 15GB and in solr 6, for the same data, the index
> > size
> > > is 300GB! I am not able to understand what contributes to such huge
> > > difference in solr 6.
> > >
> > > I have been able to identify a field which is blowing up the size of
> > index.
> > > It is as follows.
> > >
> > >  > > stored="true" multiValued="true"  />
> > >
> > >  > > stored="false" multiValued="true"  />
> > > 
> > >
> > > When this field is commented out, the index size reduces to less than
> > 10GB.
> > >
> > > This field is of type text_general. Following is the definition of this
> > > type.
> > >
> > >  > > positionIncrementGap="100">
> > >   
> > > 
> > > 
> > > 
> > >  > > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> > >  > > protected="protwords.txt" generateWordParts="1"
> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="0" splitOnCaseChange="0"/>
> > > 
> > >  > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > > />
> > >   
> > >   
> > > 
> > > 
> > > 
> > >  > > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> > >  > > protected="protwords.txt" generateWordParts="1"
> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="0" splitOnCaseChange="0"/>
> > > 
> > >  > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > > />
> > >   
> > >   
> > >
> > > Few things which I did to debug this issue:
> > >
> > >- I have ensured that field type definition is same as what I was
> > using
> > >in solr 5 and it is also valid in version 6. This field type
> > considers a
> > >list of "stopwords" to be ignored during indexing. I have supplied
> the
> > > same
> > >list of stopwords which we were using in solr 5. I have verified
> that
> > > path
> > >of this file is correct and it is being loaded fine in solr admin
> UI.
> > > When
> > >I analyse these fields using "Analysis" tab of the solr admin UI, I
> > can
> > > see
> > >that stopwords are being filtered out. However, when I query with
> some
> > > of
> > >these stopwords, I do get the results back which makes me think that
> > >probably stopwords are being indexed.
> > >
> > > Any idea what could increase the size of index by so much in solr 6?
> > >
> >
>


Re: Solr partial update

2017-02-09 Thread Mike Thomsen
Set the fl parameter equal to the fields you want and then query for
id:(SOME_ID OR SOME_ID OR SOME_ID)

On Thu, Feb 9, 2017 at 5:37 AM, Midas A  wrote:

> Hi,
>
> i want solr doc partially if unique id exist else we donot want to do any
> thing .
>
> how can i achieve this .
>
> Regards,
> Midas
>


Re: Solr Kafka DIH

2017-01-31 Thread Mike Thomsen
Probably not, but writing your own little Java process to do it would be
trivial with Kafka 0.9.X or 0.10.X. You can also look at the Confluent
Platform as they have tons of connectors for Kafka to directly feed into
other systems.

On Mon, Jan 30, 2017 at 3:05 AM, Mahmoud Almokadem 
wrote:

> Hello,
>
> Is there a way to get SolrCloud to pull data from a topic in Kafak
> periodically using Dataimport Handler?
>
> Thanks
> Mahmoud


Re: Is it possible to rewrite part of the solr response?

2017-01-18 Thread Mike Thomsen
I finally got a chance to deep dive into this and have a preliminary
working plugin. I'm starting to look at optimization strategies for how to
speed processing up and am wondering if you can give me some more
information about your "bailout" strategy.

Thanks,

Mike

On Wed, Dec 21, 2016 at 9:08 PM, Erick Erickson 
wrote:

> "grab the response" is a bit ambiguous here in Solr terms. Sure,
> a SearchComponent (you can write a plugin) gets the response,
> but it only sees the final list being returned to the user, i.e. if you
> have rows=15 it sees only 15 docs. Not sure that's adequate,
> in the case above you could easily not be allowed to see any of
> the top N docs. Plus, doing anything like this would give very
> skewed things like facets, grouping, etc. Say the facets were
> calculated over 534 hits but the user was only allowed to see 10 docs...
> Very confusing.
>
> The most robust solution would be a "post filter", another bit
> of custom code that you write (plugin). See:
> http://yonik.com/advanced-filter-caching-in-solr/
> A post filter sees _all_ the documents that satisfy the query,
> and makes an include/exclude decision on each one (just
> like any other fq clause). So facets, grouping and all the rest
> "just work". Do be aware that if the ACL calculations are  expensive
> you need to be prepared for the system administrator doing a
> *:* query. I usually build in a bailout and stop passing documents
> after some number and pass back a result about "please narrow
> down your search". Of course if your business logic is such that
> you can calculate them all "fast enough", you're golden.
>
> All that said, if there's any way you can build this into tokens in the
> doc and use a standard fq clause it's usually much easier. That may
> take some creative work at indexing time if it's even possible.
>
> Best,
> Erick
>
> On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen 
> wrote:
> > We're trying out some ideas on locking down solr and would like to know
> if
> > there is a public API that allows you to grab the response before it is
> > sent and inspect it. What we're trying to do is something for which a
> > filter query is not a good option to really get where we want to be.
> > Basically, it's an integration with some business logic to make a final
> > pass at ensuring that certain business rules are followed in the event a
> > query returns documents a user is not authorized to see.
> >
> > Thanks,
> >
> > Mike
>


Re: Solr ACL Plugin Windows

2017-01-04 Thread Mike Thomsen
I didn't see a real Java project there, but the directions to compile on
Linux are almost always applicable to Windows with Java. If you find a
project that says it uses Ant or Maven, all you need to do is download Ant
or Maven, the Java Development Kit and put both of them on the windows
path. Then it's either "ant package" (IIRC most of the time) or "mvn
install" from within the folder that has the project.

FWIW, creating a simple ACL doesn't even require a custom plugin. This is
roughly how you would do it w/ an application that your team has written
that works with solr:

1. Add a multivalue string field called ACL or privileges
2. Write something for your app that can pull a list of
attributes/privileges from a database for the current user.
3. Append a filter query to the query that matches those attributes. Ex:

fq=privileges:(DEVELOPER AND DEVOPS)


If you are using a role-based system that bundles groups of permissions
into a role, all you need to do is decompose the role into a list of
permissions for the user and put all of the required permissions into that
multivalue field.

Mike

On Wed, Jan 4, 2017 at 2:55 AM,  wrote:

> I am searching a SOLR ACL Plugin, i found this
> https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/
>
> but i don't know how i can compile the jave into to a jar - all Infos i
> found was how to complie it on linux - but this doesn't help.
>
> I am running solr version 6.3.0 on windows Server 2003
>
> So i am searching for infos about compiling a plugin under windows.
>
> Thanxs in advance :D
>
> 
> This message was sent using IMP, the Internet Messaging Program.
>
>


Re: HDFS support maturity

2017-01-03 Thread Mike Thomsen
Cloudera defaults their Hadoop installation to use HDFS w/ their bundle of
Solr (4.10.3) if that is any indication.

On Tue, Jan 3, 2017 at 7:40 AM, Hendrik Haddorp 
wrote:

> Hi,
>
> is the HDFS support in Solr 6.3 considered production ready?
> Any idea how many setups might be using this?
>
> thanks,
> Hendrik
>


Re: Is it possible to rewrite part of the solr response?

2016-12-21 Thread Mike Thomsen
Thanks. I'll look into that stuff. The counts issue is really not a serious
problem for us far as I know.

On Wed, Dec 21, 2016 at 9:08 PM, Erick Erickson 
wrote:

> "grab the response" is a bit ambiguous here in Solr terms. Sure,
> a SearchComponent (you can write a plugin) gets the response,
> but it only sees the final list being returned to the user, i.e. if you
> have rows=15 it sees only 15 docs. Not sure that's adequate,
> in the case above you could easily not be allowed to see any of
> the top N docs. Plus, doing anything like this would give very
> skewed things like facets, grouping, etc. Say the facets were
> calculated over 534 hits but the user was only allowed to see 10 docs...
> Very confusing.
>
> The most robust solution would be a "post filter", another bit
> of custom code that you write (plugin). See:
> http://yonik.com/advanced-filter-caching-in-solr/
> A post filter sees _all_ the documents that satisfy the query,
> and makes an include/exclude decision on each one (just
> like any other fq clause). So facets, grouping and all the rest
> "just work". Do be aware that if the ACL calculations are  expensive
> you need to be prepared for the system administrator doing a
> *:* query. I usually build in a bailout and stop passing documents
> after some number and pass back a result about "please narrow
> down your search". Of course if your business logic is such that
> you can calculate them all "fast enough", you're golden.
>
> All that said, if there's any way you can build this into tokens in the
> doc and use a standard fq clause it's usually much easier. That may
> take some creative work at indexing time if it's even possible.
>
> Best,
> Erick
>
> On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen 
> wrote:
> > We're trying out some ideas on locking down solr and would like to know
> if
> > there is a public API that allows you to grab the response before it is
> > sent and inspect it. What we're trying to do is something for which a
> > filter query is not a good option to really get where we want to be.
> > Basically, it's an integration with some business logic to make a final
> > pass at ensuring that certain business rules are followed in the event a
> > query returns documents a user is not authorized to see.
> >
> > Thanks,
> >
> > Mike
>


Is it possible to rewrite part of the solr response?

2016-12-21 Thread Mike Thomsen
We're trying out some ideas on locking down solr and would like to know if
there is a public API that allows you to grab the response before it is
sent and inspect it. What we're trying to do is something for which a
filter query is not a good option to really get where we want to be.
Basically, it's an integration with some business logic to make a final
pass at ensuring that certain business rules are followed in the event a
query returns documents a user is not authorized to see.

Thanks,

Mike


Replica document counts out of sync

2016-11-30 Thread Mike Thomsen
In one of our environments, we have an issue where one shard has two
replicas with smaller document counts than the third one. This is on Solr
4.10.3 (Cloudera's build). We've found that shutting down the smaller
replicas, deleting their data folders and restarting one by one will do the
trick of forcing them to get the bigger and fresher index from the third
one.

We aren't doing anything different with the document router configuration
or anything like that. It's a really simple and straight forward
installation of Solr that is largely based on defaults for everything. Any
suggestions on what might be getting us into this situation? Also, is there
a SolrCloud API for forcing those two replicas to sync with the third or do
we have to continue using that manual process?

Thanks,

Mike


Detecting schema errors while adding documents

2016-11-16 Thread Mike Thomsen
We're stuck on Solr 4.10.3 (Cloudera bundle). Is there any way to detect
with SolrJ when a document added to the index violated the schema? All we
see when we look at the stacktrace for the SolrException that comes back is
that it contains messages about an IOException when talking to the solr
nodes. Solr is up and running, and the documents are only invalid because I
added a Java statement to make a field invalid for testing purposes. When I
remove that statement, the indexing happens just fine.

Any way to do this? I seem to recall that at least in newer versions of
Solr it would tell you more about the specific error.

Thanks,

Mike


Re: Rolling backups of a collection

2016-11-09 Thread Mike Thomsen
Thanks. If we write such a process, I'll see if I can get permission to
release it. It might be a moot point because I found out we're stuck on
4.10.3 for the time being. Haven't used that version in a while and forgot
it didn't even have the collection backup API.

On Wed, Nov 9, 2016 at 2:18 PM, Hrishikesh Gadre 
wrote:

> Hi Mike,
>
> I filed SOLR-9744 <https://issues.apache.org/jira/browse/SOLR-9744> to
> track this work. Please comment on this jira if you have any suggestions.
>
> Thanks
> Hrishikesh
>
>
> On Wed, Nov 9, 2016 at 11:07 AM, Hrishikesh Gadre 
> wrote:
>
> > Hi Mike,
> >
> > Currently we don't have capability to take rolling backups for the Solr
> > collections. I think it should be fairly straightforward to write a
> script
> > that implements this functionality outside of Solr. If you post that
> > script, may be we can even ship it as part of Solr itself (for the
> benefit
> > of the community).
> >
> > Thanks
> > Hrishikesh
> >
> >
> >
> > On Wed, Nov 9, 2016 at 9:17 AM, Mike Thomsen 
> > wrote:
> >
> >> I read over the docs (
> >> https://cwiki.apache.org/confluence/display/solr/Making+and+
> >> Restoring+Backups)
> >> and am not quite sure what route to take. My team is looking for a way
> to
> >> backup the entire index of a SolrCloud collection with regular rotation
> >> similar to the backup option available in a single node deployment.
> >>
> >> We have plenty of space in our HDFS cluster. Resources are not an issue
> in
> >> the least to have a rolling back up of say, the last seven days. Is
> there
> >> a
> >> good way to implement this sort of rolling backup with the APIs or will
> we
> >> have to roll some of the functionality ourselves?
> >>
> >> I'm not averse to using the API to dump a copy of each shard to HDFS.
> >> Something like this:
> >>
> >> /solr/collection/replication?command=backup&name=shard_1_1&
> numberToKeep=7
> >>
> >> Is that a viable route to achieve this or do we need to do something
> else?
> >>
> >> Thanks,
> >>
> >> Mike
> >>
> >
> >
>


Rolling backups of a collection

2016-11-09 Thread Mike Thomsen
I read over the docs (
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups)
and am not quite sure what route to take. My team is looking for a way to
backup the entire index of a SolrCloud collection with regular rotation
similar to the backup option available in a single node deployment.

We have plenty of space in our HDFS cluster. Resources are not an issue in
the least to have a rolling back up of say, the last seven days. Is there a
good way to implement this sort of rolling backup with the APIs or will we
have to roll some of the functionality ourselves?

I'm not averse to using the API to dump a copy of each shard to HDFS.
Something like this:

/solr/collection/replication?command=backup&name=shard_1_1&numberToKeep=7

Is that a viable route to achieve this or do we need to do something else?

Thanks,

Mike


Backup to HDFS while running cluster on local disk

2016-11-08 Thread Mike Thomsen
We have SolrCloud running on bare metal but want the nightly snapshots to
be written to HDFS. Can someone give me some help on configuring the
HdfsBackupRepository?



${solr.hdfs.default.backup.path}
${solr.hdfs.home:}
${solr.hdfs.confdir:}



Not sure how to proceed on configuring this because the documentation is a
bit sparse on what some of those values mean in this context. The example
looked geared toward someone using HDFS both to store the index and do
backup/restore.

Thanks,

Mike


Best way to generate multivalue fields from streaming API

2016-09-16 Thread Mike Thomsen
Read this article and thought it could be interesting as a way to do
ingestion:

https://dzone.com/articles/solr-streaming-expressions-for-collection-auto-upd-1

Example from the article:

daemon(id="12345",

 runInterval="6",

 update(users,

 batchSize=10,

 jdbc(connection="jdbc:mysql://localhost/users?user=root&password=solr",
sql="SELECT id, name FROM users", sort="id asc",
driver="com.mysql.jdbc.Driver")

)

What's the best way to handle a multivalue field using this API? Is
there a way to tokenize something returned in a database field?

Thanks,

Mike


Update command not working

2016-02-26 Thread Mike Thomsen
I posted this to http://localhost:8983/solr/default-collection/update and
it treated it like I was adding a whole document, not a partial update:

{
"id": "0be0daa1-a6ee-46d0-ba05-717a9c6ae283",
"tags": {
"add": [ "news article" ]
}
}

In the logs, I found this:

2016-02-26 14:07:50.831 ERROR (qtp2096057945-17) [c:default-collection
s:shard1_1 r:core_node21 x:default-collection] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException:
[doc=0be0daa1-a6ee-46d0-ba05-717a9c6ae283] missing required field: data_type
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:198)
at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:83)
at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:273)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:207)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)

Does  this make any sense?  I sent updates just fine a day or two ago like
that, now it is acting like the update request is a whole new document.

Thanks,

Mike


Re: /select changes between 4 and 5

2016-02-24 Thread Mike Thomsen
Yeah, it was a problem on my end. Not just the content-type as you
suggested, but I had to wrap that whole JSON body so it looked like this:

{
"params": { ///That block pasted here }
}

On Wed, Feb 24, 2016 at 11:05 AM, Yonik Seeley  wrote:

> POST in general still works for queries... I just verified it:
>
> curl -XPOST "http://localhost:8983/solr/techproducts/select"; -d "q=*:*"
>
> Maybe it's your content-type (since it seems like you are posting
> Python)... Were you using some sort of custom code that could
> read/accept other content types?
>
> -Yonik
>
>
> On Wed, Feb 24, 2016 at 8:48 AM, Mike Thomsen 
> wrote:
> > With 4.10, we used to post JSON like this example (part of it is Python)
> to
> > /select:
> >
> > {
> > "q": "LONG_QUERY_HERE",
> > "fq": fq,
> > "fl": ["id", "title", "date_of_information", "link", "search_text"],
> > "rows": 100,
> > "wt": "json",
> > "indent": "true",
> > "_": int(time.time())
> > }
> >
> > We just upgraded to 5.4.1, and now we can't seem to POST anything to
> > /select. I tried it out in the admin tool, and it only does GET
> operations
> > against /select (tried changing it to POST and moving query string to the
> > body with Firefox dev tools, but that failed).
> >
> > Is there a way to keep doing something like what we were doing or do we
> > need to limit ourselves to GETs? I think our queries are all small enough
> > now for that, but it would helpful to know for planning.
> >
> > Thanks,
> >
> > Mike
>


/select changes between 4 and 5

2016-02-24 Thread Mike Thomsen
With 4.10, we used to post JSON like this example (part of it is Python) to
/select:

{
"q": "LONG_QUERY_HERE",
"fq": fq,
"fl": ["id", "title", "date_of_information", "link", "search_text"],
"rows": 100,
"wt": "json",
"indent": "true",
"_": int(time.time())
}

We just upgraded to 5.4.1, and now we can't seem to POST anything to
/select. I tried it out in the admin tool, and it only does GET operations
against /select (tried changing it to POST and moving query string to the
body with Firefox dev tools, but that failed).

Is there a way to keep doing something like what we were doing or do we
need to limit ourselves to GETs? I think our queries are all small enough
now for that, but it would helpful to know for planning.

Thanks,

Mike


Leader election issues after upgrade from 4.10.4 to 5.4.1

2016-02-08 Thread Mike Thomsen
We get this error on one of our nodes:

Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader of shard: shard2 our state says:
http://server01:8983/solr/collection/ but zookeeper says:
http://server02:8983/collection/


Then I noticed this in the log:

] o.a.s.c.c.ZkStateReader Load collection config
from:/collections/collection
2016-02-09 00:09:56.763 INFO  (qtp1037197792-12) [   ]
o.a.s.c.c.ZkStateReader path=/collections/collection configName=collection
specified config exists in ZooKeeper

We have a clusterstate.json file left over from 4.X. I read this thread and
the first comment or two suggested that clusterstate.json is now broken up
and refactored into the collections' configuration:

http://grokbase.com/t/lucene/solr-user/152v8bab2z/solr-cloud-does-not-start-with-many-collections

So should we get rid of the clusterstate.json file or keep it? We have 4
Solr VMs in our devops environment. They have 2 CPUs and 4GB of RAM. There
are about 7 collections shared between then, but all are negligible (like a
few hundred kb each) except for one which is about 22GB.

Thanks,

Mike


zkCli.sh not in solr 5.4?

2016-01-19 Thread Mike Thomsen
I downloaded a build of 5.4.0 to install in some VMs and noticed that
zkCli.sh is not there. I need it in order to upload a configuration set to
ZooKeeper before I create the collection. What's the preferred way of doing
that?

Specifically, I need to specify a configuration like this because it's in a
Vagrant-managed set of VMs and I need to tell it to use the private network
IP addresses not my host's IP address:

/admin/collections?action=CREATE&name=default-collection2&numShards=4&replicationFactor=1&maxShardsPerNode=1&createNodeSet=192.168.56.20:8983
_solr,192.168.56.21:8983_solr,192.168.56.22:8983_solr,192.168.56.23:8983
_solr&collection.configName=default-collection

Thanks,

Mike


Phrase query not matching exact tokens in some cases

2015-07-14 Thread Mike Thomsen
For the query "police office" our users are getting back highlighted
results for "police office*r*" (and "police office*rs*") I get why a search
for police officers would include just "office" since the stemmer would
cause that behavior. However I don't understand why "office" is matching
"officer" here when no fuzzy matching is being done. Is that also a result
of our stemmer?

Here's the text field we're using:

























Thanks,

Mike


Re: Exact phrase search on very large text

2015-06-26 Thread Mike Thomsen
I tried creating a simplified new text field type that only did lower
casing and exact phrasing worked this time. I'm not sure what the problem
was. Perhaps it was a case of copypasta gone bad because I could have sworn
that I tried exact phrase matching against a simple text field with bad
results. Thanks for the help. In case anyone sees this and wonders what the
field I created looks like here it is (with phonetic matching)















On Fri, Jun 26, 2015 at 7:24 AM, Jack Krupansky 
wrote:

> Lucene, the underlying search engine library, imposes this 32K limit for
> individual terms. Use tokenized text instead.
>
> -- Jack Krupansky
>
> On Thu, Jun 25, 2015 at 8:36 PM, Mike Thomsen 
> wrote:
>
> > I need to be able to do exact phrase searching on some documents that
> are a
> > few hundred kb when treated as a single block of text. I'm on 4.10.4 and
> it
> > complains when I try to put something larger than 32kb in using a
> textfield
> > with the keyword tokenizer as the tokenizer. Is there any way I can index
> > say a 500kb block of text like this?
> >
> > Thanks,
> >
> > Mike
> >
>


Exact phrase search on very large text

2015-06-25 Thread Mike Thomsen
I need to be able to do exact phrase searching on some documents that are a
few hundred kb when treated as a single block of text. I'm on 4.10.4 and it
complains when I try to put something larger than 32kb in using a textfield
with the keyword tokenizer as the tokenizer. Is there any way I can index
say a 500kb block of text like this?

Thanks,

Mike


ManagedStopFilterFactory not accepting ignoreCase

2015-06-17 Thread Mike Thomsen
We're running Solr 4.10.4 and getting this...

Caused by: java.lang.IllegalArgumentException: Unknown parameters:
{ignoreCase=true}
at
org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.(BaseManagedTokenFilterFactory.java:46)
at
org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.(ManagedStopFilterFactory.java:47)

This is the filter definition I used:



Any ideas?

Thanks,

Mike


Exact phrase search not working

2015-06-11 Thread Mike Thomsen
This is my field definition:






















Then I query for this exact phrase (which I can see in various documents)
and get no results...

my_field: "baltimore police force"

This is the output of the debugQuery part of the result set.

"rawquerystring": "\"baltimore police force\"",
"querystring": "\"baltimore police force\"",
"parsedquery": "PhraseQuery(search_text:\"baltimore ? police ? ? force\")",
"parsedquery_toString": "search_text:\"baltimore ? police ? ? force\"",
"QParser": "LuceneQParser",

Thanks,

Mike


Re: Shard still around after calling splitshard

2015-06-04 Thread Mike Thomsen
Thanks. I thought it worked like that, but didn't want to jump to
conclusions.

On Thu, Jun 4, 2015 at 1:42 PM, Anshum Gupta  wrote:

> Hi Mike,
>
> Once the SPLITSHARD call completes, it just marks the original shard as
> Inactive i.e. it no longer accepts requests. So yes, you would have to use
> DELETESHARD (
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
> )
> to clean it up.
>
> As far as what you see on the admin UI, that information is wrong i.e. the
> UI does not respect the state of the shard while displaying them. So,
> though the parent shard might be inactive, you still would end up seeing it
> as just another active shard. There's an open issue for this one.
>
> One way to confirm the shard state is by looking at the shard state in
> clusterstate.json (or state.json, depending upon the version of Solr you're
> using).
>
>
> On Thu, Jun 4, 2015 at 10:35 AM, Mike Thomsen 
> wrote:
>
> > I thought splitshard was supposed to get rid of the original shard,
> > shard1, in this case. Am I missing something? I was expecting the only
> two
> > remaining shards to be shard1_0 and shard1_1.
> >
> > The REST call I used was
> >
> /admin/collections?collection=default-collection&shard=shard1&action=SPLITSHARD
> > if that helps.
> >
> > Attached is a screenshot of the Cloud view in the admin console after
> > running splitshard.
> >
> > Should it look like that? Do I need to delete shard1 now?
> >
> > Thanks,
> >
> > Mike
> >
>
>
>
> --
> Anshum Gupta
>


Shard still around after calling splitshard

2015-06-04 Thread Mike Thomsen
I thought splitshard was supposed to get rid of the original shard, shard1,
in this case. Am I missing something? I was expecting the only two
remaining shards to be shard1_0 and shard1_1.

The REST call I used was
/admin/collections?collection=default-collection&shard=shard1&action=SPLITSHARD
if that helps.

Attached is a screenshot of the Cloud view in the admin console after
running splitshard.

Should it look like that? Do I need to delete shard1 now?

Thanks,

Mike


Managed synonyms and Solr Java API

2015-04-29 Thread Mike Thomsen
Is there a way to manage synonyms through Solr's Java API? Google doesn't
turn up any good results, and I didn't see anything in the javadocs that
looked promising.

Thanks,

Mike


Can't find result of autophrase filter

2015-04-20 Thread Mike Thomsen
This is the content of my autophrases.txt file:

al qaeda in the arabian peninsula
seat belt

I've attached a screenshot showing the analysis view of the index. When I
query for al_qaeda_in_the_arabian_peninsula or
alqaedainthearabianpeninsula, nothing comes back even though at least the
latter appears to be a token that makes it all the way through the index
filter chain.

I'm just using this to find it:

search_text:alqaedainthearabianpeninsula

Any ideas about why this isn't returning anything?

This is the field type declaration:


  







  
  







  



Re: Using synonyms API

2015-04-15 Thread Mike Thomsen
I also tried the 4.10.4 default example and set up the synonym list like
this:

{
  "responseHeader":{
"status":0,
"QTime":2},
  "synonymMappings":{
"initArgs":{
  "ignoreCase":true,
  "format":"solr"},
"initializedOn":"2015-04-15T20:26:02.072Z",
"managedMap":{
  "Battery":["Deadweight"],
  "GB":["GiB",
"Gigabyte"],
  "TV":["Television"],
  "happy":["glad",
"joyful"]}}}

I added a dynamicField called my_syntext with a type of
managed_english per the example.

Then I indexed an example from the ipod data set with my_syntext set
to Full Battery for you as the text.

Finally, did a search on my_syntext for Deadweight and nothing came
back. I reloaded the core and even restarted solr. Nothing seemed to
work.



On Wed, Apr 15, 2015 at 3:04 PM, Yonik Seeley  wrote:

> I just tried this quickly on trunk and it still works.
>
> /opt/code/lusolr_trunk$ curl
> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":234},
>   "synonymMappings":{
> "initArgs":{
>   "ignoreCase":true,
>   "format":"solr"},
> "initializedOn":"2015-04-14T19:39:55.157Z",
> "managedMap":{
>   "GB":["GiB",
> "Gigabyte"],
>   "TV":["Television"],
>   "happy":["glad",
> "joyful"]}}}
>
>
> Verify that your URL has the correct port number (your example below
> doesn't), and that "default-collection" is actually the name of your
> default collection (and not "collection1" which is the default for the
> 4x series).
>
> -Yonik
>
>
> On Wed, Apr 15, 2015 at 11:11 AM, Mike Thomsen 
> wrote:
> > We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our
> > synonyms like this:
> >
> >
> http://localhost/solr/default-collection/schema/analysis/synonyms/english
> >
> > I got a not found error. I found this page on new features in 4.8
> >
> > http://yonik.com/solr-4-8-features/
> >
> > Do we have to do something like this with our schema to even get the
> > synonyms API working?
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> > 
> >   
> > 
> >
> > I wanted to ask before changing our schema.
> >
> > Thanks,
> >
> > Mike
>


Re: Using synonyms API

2015-04-15 Thread Mike Thomsen
Thanks. It turned out to be caused by me not using the
ManagedSynonymFilterFactory.

I added the dummy managed_en field:


  



  


and defined a field that uses it in the schema block like so:




Here is the output of the managed synonym listing:

{
  "responseHeader":{
"status":0,
"QTime":343},
  "synonymMappings":{
"initArgs":{"ignoreCase":false},
"initializedOn":"2015-04-15T19:13:15.072Z",
"managedMap":{"Crota":["Crouton"]}}}



I posted this document successfully and can find it when I search for it
with this: *dummy_text: Crota*

{
"id": "stupidtestmessage",
"label": "Crota, Son of Oryx, lives!",
"dummy_stuff": [
"Crota, Son of Oryx, and pretty important dude in the Hive was
discovered alive and well in the hellmouth today!"
]
}

When I use *dummy_text: Crouton*, nothing comes back. I am pretty confident
that I am missing something here. Any ideas?

Thanks,

Mike

On Wed, Apr 15, 2015 at 3:04 PM, Yonik Seeley  wrote:

> I just tried this quickly on trunk and it still works.
>
> /opt/code/lusolr_trunk$ curl
> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":234},
>   "synonymMappings":{
> "initArgs":{
>   "ignoreCase":true,
>   "format":"solr"},
> "initializedOn":"2015-04-14T19:39:55.157Z",
> "managedMap":{
>   "GB":["GiB",
> "Gigabyte"],
>   "TV":["Television"],
>   "happy":["glad",
> "joyful"]}}}
>
>
> Verify that your URL has the correct port number (your example below
> doesn't), and that "default-collection" is actually the name of your
> default collection (and not "collection1" which is the default for the
> 4x series).
>
> -Yonik
>
>
> On Wed, Apr 15, 2015 at 11:11 AM, Mike Thomsen 
> wrote:
> > We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our
> > synonyms like this:
> >
> >
> http://localhost/solr/default-collection/schema/analysis/synonyms/english
> >
> > I got a not found error. I found this page on new features in 4.8
> >
> > http://yonik.com/solr-4-8-features/
> >
> > Do we have to do something like this with our schema to even get the
> > synonyms API working?
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> > 
> >   
> > 
> >
> > I wanted to ask before changing our schema.
> >
> > Thanks,
> >
> > Mike
>


Using synonyms API

2015-04-15 Thread Mike Thomsen
We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our
synonyms like this:

http://localhost/solr/default-collection/schema/analysis/synonyms/english

I got a not found error. I found this page on new features in 4.8

http://yonik.com/solr-4-8-features/

Do we have to do something like this with our schema to even get the
synonyms API working?



  



  


I wanted to ask before changing our schema.

Thanks,

Mike


Re: Using the collections API to create a new collection

2015-03-15 Thread Mike Thomsen
Thanks. I think I found the final problem that I was facing in this ticket:

https://issues.apache.org/jira/browse/SOLR-5306

On Sun, Mar 15, 2015 at 11:53 AM, Erick Erickson 
wrote:

> Yes, "configs" is the same as configsets, I tend to use them
> interchangeably.
>
> You're still missing the point. Once the files are up in Zookeeper, that's
> where
> they live. They do NOT then live on the nodes hosting the replicas. So,
> assuming that when you write
> bq: Our ZK configuration data is unde /dev-local-solr/configs
> you mean that's the directory you specified with the upconfig command, it's
> not totally irrelevant to your replicas. When you create your
> collection, you give it the name of one of your config sets that you've
> uploaded
> to ZK. It doesn't matter in the least _how_ they go there.
>
> Now whenever one of the replicas for that collection starts up, it contact
> ZK and reads the config files and starts up. The replica does _not_
> copy the files locally.
>
> HTH,
> Erick
>
> On Sun, Mar 15, 2015 at 6:16 AM, Mike Thomsen 
> wrote:
> > I tried that with upconfig, and it created it under /configs. Our ZK
> > configuration data is under /dev-local-solr/configs. Not sure how to
> > specify that. Also, is "configs" the same thing as  "configsets" for the
> > version of solr that I'm using?
> >
> > Thanks,
> >
> > Mike
> >
> > On Sat, Mar 14, 2015 at 6:38 PM, Anshum Gupta 
> > wrote:
> >
> >> Hi Mike,
> >>
> >> Here's what you want to do:
> >> 1. Create or use an existing config set.
> >> 2. Upload it to ZooKeeper (
> >> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities
> )
> >> 3. Use the config name when you create the collection. This would link
> the
> >> config set in zk with your collection.
> >>
> >> I think it would make a lot of sense for you to go through the getting
> >> started with SolrCloud section in the Solr Reference Guide  for 4.5.
> >>
> >> On Sat, Mar 14, 2015 at 12:02 PM, Mike Thomsen 
> >> wrote:
> >>
> >> > I looked in the tree view and I have only a node called "configs."
> >> Nothing
> >> > called "configsets." That's a serious problem, right? So if I'm
> reading
> >> > this correctly, I should be able to create a configset based on an
> >> existing
> >> > collection and load it into zookeeper once I find the right location
> to
> >> put
> >> > it on our system. Sound right?
> >> >
> >> > https://cwiki.apache.org/confluence/display/solr/Config+Sets
> >> >
> >> > Thanks,
> >> >
> >> > Mike
> >> >
> >> > On Sat, Mar 14, 2015 at 2:27 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > I bet you did not push the configuration to Zookeeper before
> creating
> >> > > the collection.
> >> > > If you look in your admin UI, the Cloud link and the "tree" version,
> >> > > you'll find
> >> > > a "configsets" directory that'll show you what you _have_ put in ZK,
> >> and
> >> > > I'll
> >> > > bet you find nothing like  a config set (containing schema.xml etc)
> >> named
> >> > > what
> >> > > you specified for someExistingCollection. It's not the _collection_
> >> > > that should be
> >> > > existing, it should be the configset.
> >> > >
> >> > > It's often a bit confusing because if the configName is not
> specified,
> >> > > the default
> >> > > is to look for a config set of the same name as the collection being
> >> > > created.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Sat, Mar 14, 2015 at 10:26 AM, Mike Thomsen <
> mikerthom...@gmail.com
> >> >
> >> > > wrote:
> >> > > > We're running SolrCloud 4.5.0. It's just a standard version of
> >> > SolrCloud
> >> > > > deployed in Tomcat, not something like the Cloudera distribution
> (I
> >> > note
> >> > > > that because I can't seem to find solrctl and other things
> referenced
>

Re: Using the collections API to create a new collection

2015-03-15 Thread Mike Thomsen
I tried that with upconfig, and it created it under /configs. Our ZK
configuration data is under /dev-local-solr/configs. Not sure how to
specify that. Also, is "configs" the same thing as  "configsets" for the
version of solr that I'm using?

Thanks,

Mike

On Sat, Mar 14, 2015 at 6:38 PM, Anshum Gupta 
wrote:

> Hi Mike,
>
> Here's what you want to do:
> 1. Create or use an existing config set.
> 2. Upload it to ZooKeeper (
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities)
> 3. Use the config name when you create the collection. This would link the
> config set in zk with your collection.
>
> I think it would make a lot of sense for you to go through the getting
> started with SolrCloud section in the Solr Reference Guide  for 4.5.
>
> On Sat, Mar 14, 2015 at 12:02 PM, Mike Thomsen 
> wrote:
>
> > I looked in the tree view and I have only a node called "configs."
> Nothing
> > called "configsets." That's a serious problem, right? So if I'm reading
> > this correctly, I should be able to create a configset based on an
> existing
> > collection and load it into zookeeper once I find the right location to
> put
> > it on our system. Sound right?
> >
> > https://cwiki.apache.org/confluence/display/solr/Config+Sets
> >
> > Thanks,
> >
> > Mike
> >
> > On Sat, Mar 14, 2015 at 2:27 PM, Erick Erickson  >
> > wrote:
> >
> > > I bet you did not push the configuration to Zookeeper before creating
> > > the collection.
> > > If you look in your admin UI, the Cloud link and the "tree" version,
> > > you'll find
> > > a "configsets" directory that'll show you what you _have_ put in ZK,
> and
> > > I'll
> > > bet you find nothing like  a config set (containing schema.xml etc)
> named
> > > what
> > > you specified for someExistingCollection. It's not the _collection_
> > > that should be
> > > existing, it should be the configset.
> > >
> > > It's often a bit confusing because if the configName is not specified,
> > > the default
> > > is to look for a config set of the same name as the collection being
> > > created.
> > >
> > > Best,
> > > Erick
> > >
> > > On Sat, Mar 14, 2015 at 10:26 AM, Mike Thomsen  >
> > > wrote:
> > > > We're running SolrCloud 4.5.0. It's just a standard version of
> > SolrCloud
> > > > deployed in Tomcat, not something like the Cloudera distribution (I
> > note
> > > > that because I can't seem to find solrctl and other things referenced
> > in
> > > > the Cloudera tutorials).
> > > >
> > > > I'm trying to create a new Solr collection like this:
> > > >
> > > >
> > >
> >
> /admin/collections?action=CREATE&name=newCollection&numShards=1&collection.configName=someExistingCollection
> > > >
> > > > Then I found this error message in the logs:
> > > >
> > > > org.apache.solr.common.cloud.ZooKeeperException: Specified config
> does
> > > not
> > > > exist in ZooKeeper:newCollection2
> > > > at
> > > >
> > org.apache.solr.cloud.ZkController.readConfigName(ZkController.java:742)
> > > > at
> > > > org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244)
> > > > at
> > > org.apache.solr.core.CoreContainer.create(CoreContainer.java:557)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:465)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:146)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > > at
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
> > > > at
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
> > > > at
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
> > > > at
> > > >
> >

Re: Using the collections API to create a new collection

2015-03-14 Thread Mike Thomsen
I added this to my solr.xml and restarted, but it didn't do anything even
though the path is valid and /opt/configsets contains a folder called base
with a conf folder and valid schema and solrconfig...


${configSetBaseDir:/opt/configsets}


Any ideas? Is there a way to force an update into zookeeper? Or should I
just purge the zookeeper data?

On Sat, Mar 14, 2015 at 3:02 PM, Mike Thomsen 
wrote:

> I looked in the tree view and I have only a node called "configs." Nothing
> called "configsets." That's a serious problem, right? So if I'm reading
> this correctly, I should be able to create a configset based on an existing
> collection and load it into zookeeper once I find the right location to put
> it on our system. Sound right?
>
> https://cwiki.apache.org/confluence/display/solr/Config+Sets
>
> Thanks,
>
> Mike
>
> On Sat, Mar 14, 2015 at 2:27 PM, Erick Erickson 
> wrote:
>
>> I bet you did not push the configuration to Zookeeper before creating
>> the collection.
>> If you look in your admin UI, the Cloud link and the "tree" version,
>> you'll find
>> a "configsets" directory that'll show you what you _have_ put in ZK, and
>> I'll
>> bet you find nothing like  a config set (containing schema.xml etc) named
>> what
>> you specified for someExistingCollection. It's not the _collection_
>> that should be
>> existing, it should be the configset.
>>
>> It's often a bit confusing because if the configName is not specified,
>> the default
>> is to look for a config set of the same name as the collection being
>> created.
>>
>> Best,
>> Erick
>>
>> On Sat, Mar 14, 2015 at 10:26 AM, Mike Thomsen 
>> wrote:
>> > We're running SolrCloud 4.5.0. It's just a standard version of SolrCloud
>> > deployed in Tomcat, not something like the Cloudera distribution (I note
>> > that because I can't seem to find solrctl and other things referenced in
>> > the Cloudera tutorials).
>> >
>> > I'm trying to create a new Solr collection like this:
>> >
>> >
>> /admin/collections?action=CREATE&name=newCollection&numShards=1&collection.configName=someExistingCollection
>> >
>> > Then I found this error message in the logs:
>> >
>> > org.apache.solr.common.cloud.ZooKeeperException: Specified config does
>> not
>> > exist in ZooKeeper:newCollection2
>> > at
>> > org.apache.solr.cloud.ZkController.readConfigName(ZkController.java:742)
>> > at
>> > org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244)
>> > at
>> org.apache.solr.core.CoreContainer.create(CoreContainer.java:557)
>> > at
>> >
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:465)
>> > at
>> >
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:146)
>> > at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>> > at
>> >
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
>> > at
>> >
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>> > at
>> >
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
>> > at
>> >
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
>> > at
>> >
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
>> > at
>> >
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
>> > at
>> >
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
>> > at
>> >
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
>> > at
>> >
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdap

Re: Using the collections API to create a new collection

2015-03-14 Thread Mike Thomsen
I looked in the tree view and I have only a node called "configs." Nothing
called "configsets." That's a serious problem, right? So if I'm reading
this correctly, I should be able to create a configset based on an existing
collection and load it into zookeeper once I find the right location to put
it on our system. Sound right?

https://cwiki.apache.org/confluence/display/solr/Config+Sets

Thanks,

Mike

On Sat, Mar 14, 2015 at 2:27 PM, Erick Erickson 
wrote:

> I bet you did not push the configuration to Zookeeper before creating
> the collection.
> If you look in your admin UI, the Cloud link and the "tree" version,
> you'll find
> a "configsets" directory that'll show you what you _have_ put in ZK, and
> I'll
> bet you find nothing like  a config set (containing schema.xml etc) named
> what
> you specified for someExistingCollection. It's not the _collection_
> that should be
> existing, it should be the configset.
>
> It's often a bit confusing because if the configName is not specified,
> the default
> is to look for a config set of the same name as the collection being
> created.
>
> Best,
> Erick
>
> On Sat, Mar 14, 2015 at 10:26 AM, Mike Thomsen 
> wrote:
> > We're running SolrCloud 4.5.0. It's just a standard version of SolrCloud
> > deployed in Tomcat, not something like the Cloudera distribution (I note
> > that because I can't seem to find solrctl and other things referenced in
> > the Cloudera tutorials).
> >
> > I'm trying to create a new Solr collection like this:
> >
> >
> /admin/collections?action=CREATE&name=newCollection&numShards=1&collection.configName=someExistingCollection
> >
> > Then I found this error message in the logs:
> >
> > org.apache.solr.common.cloud.ZooKeeperException: Specified config does
> not
> > exist in ZooKeeper:newCollection2
> > at
> > org.apache.solr.cloud.ZkController.readConfigName(ZkController.java:742)
> > at
> > org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244)
> > at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:557)
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:465)
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:146)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
> > at
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
> > at
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
> > at
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> > at
> >
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> > at java.lang.Thread.run(Unknown Source)
> >
> > Mar 14, 2015 1:21:07 PM org.apache.solr.common.SolrException log
>

Using the collections API to create a new collection

2015-03-14 Thread Mike Thomsen
We're running SolrCloud 4.5.0. It's just a standard version of SolrCloud
deployed in Tomcat, not something like the Cloudera distribution (I note
that because I can't seem to find solrctl and other things referenced in
the Cloudera tutorials).

I'm trying to create a new Solr collection like this:

/admin/collections?action=CREATE&name=newCollection&numShards=1&collection.configName=someExistingCollection

Then I found this error message in the logs:

org.apache.solr.common.cloud.ZooKeeperException: Specified config does not
exist in ZooKeeper:newCollection2
at
org.apache.solr.cloud.ZkController.readConfigName(ZkController.java:742)
at
org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:557)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:465)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:146)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)

Mar 14, 2015 1:21:07 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore
'newCollection2_shard1_replica1': Unable to create core:
newCollection2_shard1_replica1
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:495)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:146)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
org.apache.tomcat.util.threads.TaskT

Re: Solr cannot find solr.xml even though it's there

2014-12-20 Thread Mike Thomsen
It's supposed to be a simple two shard configuration of SolrCloud with two
copies of Solr running in different tomcat servers on the same box. I can
read the solr.xml just fine as that user (vagrant) and checked out the
permissions and there's nothing obviously wrong there.

On Sat, Dec 20, 2014 at 3:40 PM, Shawn Heisey  wrote:

> On 12/20/2014 12:27 PM, Mike Thomsen wrote:
> > at java.lang.Thread.run(Thread.java:745)
> > /solr.xml cannot start Solrcommon.SolrException: solr.xml does not
> > exist in /opt/solr/solr-shard1
> > at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:60)
> > ... 20 more
> >
> > I went so far as to set the permissions on solr-shard1 to global
> > read/write/execute (777) and yet it still cannot load the file. It
> doesn't
> > say there's a parse error or anything constructive. Any ideas as to what
> is
> > going on?
>
> If you log on to the system as the user that is attempting to start the
> container ... can you read that solr.xml file?
>
> These commands would check the permissions at each point in that path so
> you can see if the user in question has the appropriate rights at each
> level.
>
> ls -ald /.
> ls -ald /opt/.
> ls -ald /opt/solr/.
> ls -ald /opt/solr/solr-shard1/.
> ls -ald /opt/solr/solr-shard1/solr.xml
>
> It does seem odd that your solr home would be at a directory called
> solr-shard1 ... it makes more sense to me that it would be /opt/solr ...
> and that the solr.xml would live there.  Only you know what your
> directory structure is, though.
>
> Thanks,
> Shawn
>
>


Solr cannot find solr.xml even though it's there

2014-12-20 Thread Mike Thomsen
I'm getting the following stacktrace with Solr 4.5.0

SEVERE: null:org.apache.solr.common.SolrException: Could not load SOLR
configuration
at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:71)
at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:98)
at 
org.apache.solr.servlet.SolrDispatchFilter.loadConfigSolr(SolrDispatchFilter.java:144)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:175)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:279)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:260)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:105)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4830)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5510)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at 
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at 
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
/solr.xml cannot start Solrcommon.SolrException: solr.xml does not
exist in /opt/solr/solr-shard1
at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:60)
... 20 more

I went so far as to set the permissions on solr-shard1 to global
read/write/execute (777) and yet it still cannot load the file. It doesn't
say there's a parse error or anything constructive. Any ideas as to what is
going on?


Need some help with solr not restarting

2014-08-11 Thread Mike Thomsen
I'm very new to SolrCloud. When I tried restarting our tomcat server
running SolrCloud, I started getting this in our logs:

SEVERE:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/configs/configuration1/default-collection/data/index/_3ts3_Lucene41_0.doc
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:407)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:404)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:314)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1325)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at 
org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:1099)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:199)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:107)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634)
at 
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074)
at 
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Aug 11, 2014 2:21:08 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start Solr. Check solr/home property and the logs
Aug 11, 2014 2:21:08 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:224)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:107)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634)
at 
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074)
at 
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at ja

Re: Is Solr right for our project?

2010-09-28 Thread Mike Thomsen
Interesting. So what you are saying, though, is that at the moment it
is NOT there?

On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
 wrote:
> Solr will match this in version 3.1 which is the next major release.
> Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions
> Coming to a trunk near you - see 
> https://issues.apache.org/jira/browse/SOLR-1873
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 27. sep. 2010, at 17.44, Mike Thomsen wrote:
>
>> (I apologize in advance if I missed something in your documentation,
>> but I've read through the Wiki on the subject of distributed searches
>> and didn't find anything conclusive)
>>
>> We are currently evaluating Solr and Autonomy. Solr is attractive due
>> to its open source background, following and price. Autonomy is
>> expensive, but we know for a fact that it can handle our distributed
>> search requirements perfectly.
>>
>> What we need to know is if Solr has capabilities that match or roughly
>> approximate Autonomy's Distributed Search Handler. What it does it
>> acts as a front-end for all of Autonomy's IDOL search servers (which
>> correspond in this scenario to Solr shards). It is configured to know
>> what is on each shard, which servers hold each shard and intelligently
>> farms out queries based on that configuration. There is no need to
>> specify which IDOL servers to hit while querying; the DiSH just knows
>> where to go. Additionally, I believe in cases where an index piece is
>> mirrored, it also monitors server health and falls back intelligently
>> on other backup instances of a shard/index piece based on that.
>>
>> I'd appreciate it if someone can give me a frank explanation of where
>> Solr stands in this area.
>>
>> Thanks,
>>
>> Mike
>
>


Is Solr right for our project?

2010-09-27 Thread Mike Thomsen
(I apologize in advance if I missed something in your documentation,
but I've read through the Wiki on the subject of distributed searches
and didn't find anything conclusive)

We are currently evaluating Solr and Autonomy. Solr is attractive due
to its open source background, following and price. Autonomy is
expensive, but we know for a fact that it can handle our distributed
search requirements perfectly.

What we need to know is if Solr has capabilities that match or roughly
approximate Autonomy's Distributed Search Handler. What it does it
acts as a front-end for all of Autonomy's IDOL search servers (which
correspond in this scenario to Solr shards). It is configured to know
what is on each shard, which servers hold each shard and intelligently
farms out queries based on that configuration. There is no need to
specify which IDOL servers to hit while querying; the DiSH just knows
where to go. Additionally, I believe in cases where an index piece is
mirrored, it also monitors server health and falls back intelligently
on other backup instances of a shard/index piece based on that.

I'd appreciate it if someone can give me a frank explanation of where
Solr stands in this area.

Thanks,

Mike


Newbie question about search behavior

2010-08-16 Thread Mike Thomsen
Is it possible to set up Lucene to treat a keyword search such as

title:News

implicitly like

title:News*

so that any title that begins with News will be returned without the
user having to throw in a wildcard?

Also, are there any common filters and such that are generally
considered a good practice to throw into the schema for an
English-language website?

Thanks,

Mike