solr wiki

2018-11-29 Thread Gary Sieling
Can I be added to the Solr wiki contributors list?

Username: garysieling

Thanks
Gary


Re: [SolrCloud] shard hash ranges changed after restoring backup

2016-06-16 Thread Gary Yao
Hi Erick,

I should add that our Solr cluster is in production and new documents
are constantly indexed. The new cluster has been up for three weeks now.
The problem was discovered only now because in our use case Atomic
Updates and RealTime Gets are mostly performed on new documents. With
almost absolute certainty there are already documents in the index that
were distributed to the shards according to the new hash ranges. If we
just changed the hash ranges in ZooKeeper, the index would still be in
an inconsistent state.

Is there any way to recover from this without having to re-index all
documents?

Best,
Gary

2016-06-15 19:23 GMT+02:00 Erick Erickson :
> Simplest, though a bit risky is to manually edit the znode and
> correct the znode entry. There are various tools out there, including
> one that ships with Zookeeper (see the ZK documentation).
>
> Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
> down to your local machine, edit it there and then push it back up to ZK.
>
> I'd do all this with my Solr nodes shut down, then insure that my ZK
> ensemble was consistent after the update etc
>
> Best,
> Erick
>
> On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao  wrote:
>> Hi all,
>>
>> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
>> collections configured with sharding and replication.
>>
>> We recently backed up our Solr indexes using the built-in backup
>> functionality. After the cluster was restored from the backup, we
>> noticed that atomic updates of documents are failing occasionally with
>> the error message 'missing required field [...]'. The exceptions are
>> thrown on a host on which the document to be updated is not stored. From
>> this we are deducing that there is a problem with finding the right host
>> by the hash of the uniqueKey. Indeed, our investigations so far showed
>> that for at least one collection in the new cluster, the shards have
>> different hash ranges assigned now. We checked the hash ranges by
>> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
>> hash ranges of one collection that we debugged.
>>
>>   Old cluster:
>> shard1_0 8000 - aaa9
>> shard1_1  - d554
>> shard2_0 d555 - fffe
>> shard2_1  - 2aa9
>> shard3_0 2aaa - 5554
>> shard3_1  - 7fff
>>
>>   New cluster:
>> shard1 8000 - aaa9
>> shard2  - d554
>> shard3 d555 - 
>> shard4 0 - 2aa9
>> shard5 2aaa - 5554
>> shard6  - 7fff
>>
>>   Note that the shard names differ because the old cluster's shards were
>>   split.
>>
>> As you can see, the ranges of shard3 and shard4 differ from the old
>> cluster. This change of hash ranges matches with the symptoms we are
>> currently experiencing.
>>
>> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
>> in which David Smiley comments:
>>
>>   shard hash ranges aren't restored; this error could be disasterous
>>
>> It seems that this is what happened to us. We would like to hear some
>> suggestions on how we could recover from this problem.
>>
>> Best,
>> Gary


[SolrCloud] shard hash ranges changed after restoring backup

2016-06-15 Thread Gary Yao
Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

  Old cluster:
shard1_0 8000 - aaa9
shard1_1  - d554
shard2_0 d555 - fffe
shard2_1  - 2aa9
shard3_0 2aaa - 5554
shard3_1  - 7fff

  New cluster:
shard1 8000 - aaa9
shard2  - d554
shard3 d555 - 
shard4 0 - 2aa9
shard5 2aaa - 5554
shard6  - 7fff

  Note that the shard names differ because the old cluster's shards were
  split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

  shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary


Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-27 Thread Gary Taylor

Alex,

I've created JIRA ticket: https://issues.apache.org/jira/browse/SOLR-7174

In response to your suggestions below:

1. No exceptions are reported, even with onError removed.
2. ProcessMonitor shows only the very first epub file is being read 
(repeatedly)

3. I can repeat this on Ubuntu (14.04) by following the same steps.
4. Ticket raised (https://issues.apache.org/jira/browse/SOLR-7174)

Additionally (and I've added this on the ticket), if I change the 
dataConfig to use FileDataSource and PlainTextEntityProcessor, and just 
list *.txt files, it works!





baseDir="c:/Users/gt/Documents/HackerMonthly/epub" 
fileName=".*txt">





processor="PlainTextEntityProcessor"
url="${files.fileAbsolutePath}" format="text" 
dataSource="bin">







So it's something related to BinFileDataSource and TikaEntityProcessor.

Thanks,
Gary.

On 26/02/2015 14:24, Gary Taylor wrote:

Alex,

That's great.  Thanks for the pointers.  I'll try and get more info on 
this and file a JIRA issue.


Kind regards,
Gary.

On 26/02/2015 14:16, Alexandre Rafalovitch wrote:

On 26 February 2015 at 08:32, Gary Taylor  wrote:

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using
TikeEntityProcessor though) and get exactly the same result - ie. 
all files

fetched, but only one document indexed in Solr.

To me, this would indicate that something is a problem with the inner
DIH entity then. As a next set of steps, I would probably
1) remove both onError statements and see if there is an exception
that is being swallowed.
2) run the import under ProcessMonitor and see if the other files are
actually being read
https://technet.microsoft.com/en-us/library/bb896645.aspx
3) Assume a Windows bug and test this on Mac/Linux
4) File a JIRA with a replication case. If there is a full replication
setup, I'll test it machines I have access to with full debugger
step-through

For example, I wonder if FileBinDataSource is somehow not cleaning up
after the first file properly on Windows and fails to open the second
one.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/





--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor

Alex,

That's great.  Thanks for the pointers.  I'll try and get more info on 
this and file a JIRA issue.


Kind regards,
Gary.

On 26/02/2015 14:16, Alexandre Rafalovitch wrote:

On 26 February 2015 at 08:32, Gary Taylor  wrote:

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using
TikeEntityProcessor though) and get exactly the same result - ie. all files
fetched, but only one document indexed in Solr.

To me, this would indicate that something is a problem with the inner
DIH entity then. As a next set of steps, I would probably
1) remove both onError statements and see if there is an exception
that is being swallowed.
2) run the import under ProcessMonitor and see if the other files are
actually being read
https://technet.microsoft.com/en-us/library/bb896645.aspx
3) Assume a Windows bug and test this on Mac/Linux
4) File a JIRA with a replication case. If there is a full replication
setup, I'll test it machines I have access to with full debugger
step-through

For example, I wonder if FileBinDataSource is somehow not cleaning up
after the first file properly on Windows and fails to open the second
one.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/



--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using 
TikeEntityProcessor though) and get exactly the same result - ie. all 
files fetched, but only one document indexed in Solr.


With verbose output, I get a row for each file in the directory, but 
only the first one has a non-empty documentImport entity.   All 
subsequent documentImport entities just have an empty document#2 entry.  eg:


 
  "verbose-output": [
"entity:files",
[
  null,
  "--- row #1-",
  "fileSize",
  2609004,
  "fileLastModified",
  "2015-02-25T11:37:25.217Z",
  "fileAbsolutePath",
  "c:\\Users\\gt\\Documents\\epub\\issue018.epub",
  "fileDir",
  "c:\\Users\\gt\\Documents\\epub",
  "file",
  "issue018.epub",
  null,
  "-",
  "entity:documentImport",
  [
"document#1",
[
  "query",
  "c:\\Users\\gt\\Documents\\epub\\issue018.epub",
  "time-taken",
  "0:0:0.0",
  null,
  "--- row #1-",
  "text",
  "< ... parsed epub text - snip ... >"
  "title",
  "Issue 18 title",
  "Author",
  "Author text",
  null,
  "-"
],
"document#2",
[]
  ],
  null,
  "--- row #2-",
  "fileSize",
  4428804,
  "fileLastModified",
  "2015-02-25T11:37:36.399Z",
  "fileAbsolutePath",
  "c:\\Users\\gt\\Documents\\epub\\issue019.epub",
  "fileDir",
  "c:\\Users\\gt\\Documents\\epub",
  "file",
  "issue019.epub",
  null,
  "-",
  "entity:documentImport",
  [
"document#2",
[]
  ],
  null,
  "--- row #3-",
  "fileSize",
  2580266,
  "fileLastModified",
  "2015-02-25T11:37:41.188Z",
  "fileAbsolutePath",
  "c:\\Users\\gt\\Documents\\epub\\issue020.epub",
  "fileDir",
  "c:\\Users\\gt\\Documents\\epub",
  "file",
  "issue020.epub",
  null,
  "-",
  "entity:documentImport",
  [
"document#2",
[]
  ],






Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor

Alex,

Thanks for the suggestions.  It always just indexes 1 doc, regardless of 
the first epub file it sees.  Debug / verbose don't show anything 
obvious to me.  I can include the output here if you think it would help.


I tried using the SimplePostTool first ( *java 
-Dtype=application/epub+zip 
-Durl=http://localhost:8983/solr/hn1/update/extract -jar post.jar 
\Users\gt\Documents\epub\*.epub) to index the docs and check the Tika 
parsing and that works OK so I don't think it's the e*pubs.


I was trying to use DIH so that I could more easily specify the schema 
fields and store content in the index in preparation for trying out the 
search highlighting. Couldn't work out how to do that with post.jar 


Thanks,
Gary

On 25/02/2015 17:09, Alexandre Rafalovitch wrote:

Try removing that first epub from the directory and rerunning. If you
now index 0 documents, then there is something unexpected about them
and DIH skips. If it indexes 1 document again but a different one,
then it is definitely something about the repeat logic.

Also, try running with debug and verbose modes and see if something
specific shows up.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 25 February 2015 at 11:14, Gary Taylor  wrote:

I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly
add a Solr document for each epub file in my local directory.

I've just downloaded Solr 5.0.0, on a Windows 7 PC.   I ran "solr start" and
then "solr create -c hn2" to create a new core.

I want to index a load of epub files that I've got in a directory. So I
created a data-import.xml (in solr\hn2\conf):


 
 
 
 
 
 

 
 
 
 
 
 
 
 


In my solrconfig.xml, I added a requestHandler entry to reference my
data-import.xml:

   
   
   data-import.xml
   
   

I renamed managed-schema to schema.xml, and ensured the following doc fields
were setup:

   
   
   
   

   
   

   
   

 

I copied all the jars from dist and contrib\* into server\solr\lib.

Stopping and restarting solr then creates a new managed-schema file and
renames schema.xml to schema.xml.back

All good so far.

Now I go to the web admin for dataimport
(http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and
execute a full import.

But, the results show "Requests: 0, Fetched: 58, Skipped: 0, Processed:1" -
ie. it only adds one document (the very first one) even though it's iterated
over 58!

No errors are reported in the logs.

I can search on the contents of that first epub document, so it's extracting
OK in Tika, but there's a problem somewhere in my config that's causing only
1 document to be indexed in Solr.

Thanks for any assistance / pointers.

Regards,
Gary

--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
I can't get the FileListEntityProcessor and TikeEntityProcessor to 
correctly add a Solr document for each epub file in my local directory.


I've just downloaded Solr 5.0.0, on a Windows 7 PC.   I ran "solr start" 
and then "solr create -c hn2" to create a new core.


I want to index a load of epub files that I've got in a directory. So I 
created a data-import.xml (in solr\hn2\conf):










url="${files.fileAbsolutePath}" format="text" 
dataSource="bin" onError="skip">










In my solrconfig.xml, I added a requestHandler entry to reference my 
data-import.xml:


  class="org.apache.solr.handler.dataimport.DataImportHandler">

  
  data-import.xml
  
  

I renamed managed-schema to schema.xml, and ensured the following doc 
fields were setup:


  required="true" multiValued="false" />

  
  
  

  
  stored="true" />


  stored="true" multiValued="false"/>
  multiValued="true"/>




I copied all the jars from dist and contrib\* into server\solr\lib.

Stopping and restarting solr then creates a new managed-schema file and 
renames schema.xml to schema.xml.back


All good so far.

Now I go to the web admin for dataimport 
(http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and 
execute a full import.


But, the results show "Requests: 0, Fetched: 58, Skipped: 0, 
Processed:1" - ie. it only adds one document (the very first one) even 
though it's iterated over 58!


No errors are reported in the logs.

I can search on the contents of that first epub document, so it's 
extracting OK in Tika, but there's a problem somewhere in my config 
that's causing only 1 document to be indexed in Solr.


Thanks for any assistance / pointers.

Regards,
Gary

--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re:

2013-07-23 Thread Gary Young
Can anyone remove this spammer please?


On Tue, Jul 23, 2013 at 4:47 AM,  wrote:

>
> Hi!   http://mackieprice.org/cbs.com.network.html
>
>


Re: Is it possible to searh Solr with a longer query string?

2013-06-26 Thread Gary Young
Oh this is good!


On Wed, Jun 26, 2013 at 12:05 PM, Shawn Heisey  wrote:

> On 6/25/2013 6:15 PM, Jack Krupansky wrote:
> > Are you using Tomcat?
> >
> > See:
> > http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests
> >
> > Enabling Longer Query Requests
> >
> > If you try to submit too long a GET query to Solr, then Tomcat will
> > reject your HTTP request on the grounds that the HTTP header is too
> > large; symptoms may include an HTTP 400 Bad Request error or (if you
> > execute the query in a web browser) a blank browser window.
> >
> > If you need to enable longer queries, you can set the maxHttpHeaderSize
> > attribute on the HTTP Connector element in your server.xml file. The
> > default value is 4K. (See
> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html)
>
> Even better would be to force SolrJ to use a POST request.  In newer
> versions (4.1 and later) Solr sets the servlet container's POST buffer
> size and defaults it to 2MB.  In older versions, you'd have to adjust
> this in your servlet container config, but the default should be
> considerably larger than the header buffer used for GET requests.
>
> I thought that SolrJ used POST by default, but after looking at the
> code, it seems that I was wrong.  Here's how to send a POST query:
>
> response = server.query(query, METHOD.POST);
>
> The import required for this is:
>
> import org.apache.solr.client.solrj.SolrRequest.METHOD;
>
> Gary, if you can avoid it, you should not be creating a new
> HttpSolrServer object every time you make a query.  It is completely
> thread-safe, so create a singleton and use it for all queries against
> the medline core.
>
> Thanks,
> Shawn
>
>


Re: doc cache issues... query-time way to bypass cache?

2013-03-23 Thread Gary Yngve
Sigh, user error.

I missed this in the 4.1 release notes:

Collections that do not specify numShards at collection creation time use
custom sharding and default to the "implicit" router. Document updates
received by a shard will be indexed to that shard, unless a "*shard*"
parameter or document field names a different shard.


On Fri, Mar 22, 2013 at 3:39 PM, Gary Yngve  wrote:

> I have a situation we just discovered in solr4.2 where there are
> previously cached results from a limited field list, and when querying for
> the whole field list, it responds differently depending on which shard gets
> the query (no extra replicas).  It either returns the document on the
> limited field list or the full field list.
>
> We're releasing tonight, so is there a query param to selectively bypass
> the cache, which I can use as a temp fix?
>
> Thanks,
> Gary
>


doc cache issues... query-time way to bypass cache?

2013-03-22 Thread Gary Yngve
I have a situation we just discovered in solr4.2 where there are previously
cached results from a limited field list, and when querying for the whole
field list, it responds differently depending on which shard gets the query
(no extra replicas).  It either returns the document on the limited field
list or the full field list.

We're releasing tonight, so is there a query param to selectively bypass
the cache, which I can use as a temp fix?

Thanks,
Gary


Re: overseer queue clogged

2013-03-22 Thread Gary Yngve
Thanks, Mark!

The core node names in the solr.xml in solr4.2 is great!  Maybe in 4.3 it
can be supported via API?

Also I am glad you mentioned in other post the chance to namespace
zookeeper by adding a path to the end of the comma-delim zk hosts.  That
works out really well in our situation for having zk serve multiple amazon
environments that go up and down independently of each other -- no issues
w/ shared clusterstate.json or overseers.

Regarding our original problem, we were able to restart all our shards but
one, which wasn't getting past
Mar 20, 2013 5:12:54 PM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change has occurred - updating...
Mar 20, 2013 5:12:54 PM org.apache.zookeeper.ClientCnxn$EventThread
processEvent
SEVERE: Error while calling watcher
java.lang.NullPointerException
at
org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

We ended up upgrading to solr4.2 and rebuilding the whole index from our
datastore.

-Gary


On Sat, Mar 16, 2013 at 9:51 AM, Mark Miller  wrote:

> Yeah, I don't know that I've ever tried with 4.0, but I've done this with
> 4.1 and 4.2.
>
> - Mark
>
> On Mar 16, 2013, at 12:19 PM, Gary Yngve  wrote:
>
> > Cool, I'll need to try this.  I could have sworn that it didn't work that
> > way in 4.0, but maybe my test was bunk.
> >
> > -g
> >
> >
> > On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller 
> wrote:
> >>
> >> You can do this - just modify your starting Solr example to have no
> cores
> >> in solr.xml. You won't be able to make use of the admin UI until you
> create
> >> at least one core, but the core and collection apis will both work fine.
>
>


Re: overseer queue clogged

2013-03-16 Thread Gary Yngve
Cool, I'll need to try this.  I could have sworn that it didn't work that
way in 4.0, but maybe my test was bunk.

-g


On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller  wrote:
>
> You can do this - just modify your starting Solr example to have no cores
> in solr.xml. You won't be able to make use of the admin UI until you create
> at least one core, but the core and collection apis will both work fine.


Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
I will upgrade to 4.2 this weekend and see what happens.  We are on ec2 and
have had a few issues with hostnames with both zk and solr. (but in this
case i haven't rebooted any instances either)

it's relatively a pain to do the upgrade because we have a query/scorer
fork of lucene along with supplemental jars, and zk cannot distribute
binary jars via the config.

we are also multi-collection per zk... i wish it didn't require a core
always defined up front for the core admin?  i would love to have an
instance have no cores and then just create the core i need..

-g



On Fri, Mar 15, 2013 at 7:14 PM, Mark Miller  wrote:

>
> On Mar 15, 2013, at 10:04 PM, Gary Yngve  wrote:
>
> > i think those followers are red from trying to forward requests to the
> > overseer while it was being restarted.  i guess i'll see if they become
> > green over time.  or i guess i can restart them one at a time..
>
> Restarting the cluster clear things up. It shouldn't take too long for
> those nodes to recover though - they should have been up to date before.
> The couple exceptions you posted def indicate something is out of whack.
> It's something I'd like to get to the bottom of.
>
> - Mark
>
> >
> >
> > On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve 
> wrote:
> >
> >> it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
> >> are red now in the solr cloud graph.. trying to figure out what that
> >> means...
> >>
> >>
> >> On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve 
> wrote:
> >>
> >>> I restarted the overseer node and another took over, queues are empty
> now.
> >>>
> >>> the server with core production_things_shard1_2
> >>> is having these errors:
> >>>
> >>> shard update error RetryNode:
> >>>
> http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException
> :
> >>> Server refused connection at:
> >>> http://10.104.59.189:8883/solr/production_things_shard11_replica1
> >>>
> >>>  for shard11!!!
> >>>
> >>> I also got some strange errors on the restarted node.  Makes me wonder
> if
> >>> there is a string-matching bug for shard1 vs shard11?
> >>>
> >>> SEVERE: :org.apache.solr.common.SolrException: Error getting leader
> from
> >>> zk
> >>>  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
> >>>  at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
> >>>  at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
> >>>  at
> >>> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
> >>>  at
> >>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
> >>>  at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
> >>>  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
> >>>  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
> >>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>  at
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>  at
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>  at
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>  at java.lang.Thread.run(Thread.java:722)
> >>> Caused by: org.apache.solr.common.SolrException: There is conflicting
> >>> information about the leader
> >>> of shard: shard1 our state says:
> >>> http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
> >>> ://10.217.55.151:8883/solr/collection1/
> >>>  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)
> >>>
> >>> INFO: Releasing
> >>>
> directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
> >>> d11_replica1/data/index
> >>> Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
> >>> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher
> >>>  at org.apache.solr.core.SolrCore.o

Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
i think those followers are red from trying to forward requests to the
overseer while it was being restarted.  i guess i'll see if they become
green over time.  or i guess i can restart them one at a time..


On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve  wrote:

> it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
> are red now in the solr cloud graph.. trying to figure out what that
> means...
>
>
> On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve  wrote:
>
>> I restarted the overseer node and another took over, queues are empty now.
>>
>> the server with core production_things_shard1_2
>> is having these errors:
>>
>> shard update error RetryNode:
>> http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
>> Server refused connection at:
>> http://10.104.59.189:8883/solr/production_things_shard11_replica1
>>
>>   for shard11!!!
>>
>> I also got some strange errors on the restarted node.  Makes me wonder if
>> there is a string-matching bug for shard1 vs shard11?
>>
>> SEVERE: :org.apache.solr.common.SolrException: Error getting leader from
>> zk
>>   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
>>   at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
>>   at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
>>   at
>> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
>>   at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
>>   at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
>>   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
>>   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>   at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>   at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>   at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>   at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.solr.common.SolrException: There is conflicting
>> information about the leader
>> of shard: shard1 our state says:
>> http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
>> ://10.217.55.151:8883/solr/collection1/
>>   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)
>>
>> INFO: Releasing
>> directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
>> d11_replica1/data/index
>> Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
>>   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
>>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)
>>
>> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on
>> state recovering for 10.76.31.
>> 67:8883_solr but I still do not see the requested state. I see state:
>> active live:true
>>   at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
>> .java:948)
>>
>>
>>
>>
>> On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller wrote:
>>
>>> Strange - we hardened that loop in 4.1 - so I'm not sure what happened
>>> here.
>>>
>>> Can you do a stack dump on the overseer and see if you see an Overseer
>>> thread running perhaps? Or just post the results?
>>>
>>> To recover, you should be able to just restart the Overseer node and
>>> have someone else take over - they should pick up processing the queue.
>>>
>>> Any logs you might be able to share could be useful too.
>>>
>>> - Mark
>>>
>>> On Mar 15, 2013, at 7:51 PM, Gary Yngve  wrote:
>>>
>>> > Also, looking at overseer_elect, everything looks fine.  node is valid
>>> and
>>> > live.
>>> >
>>> >
>>> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve 
>>> wrote:
>>> >
>>> >> Sorry, should have specified.  4.1
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>

Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
are red now in the solr cloud graph.. trying to figure out what that
means...


On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve  wrote:

> I restarted the overseer node and another took over, queues are empty now.
>
> the server with core production_things_shard1_2
> is having these errors:
>
> shard update error RetryNode:
> http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
> Server refused connection at:
> http://10.104.59.189:8883/solr/production_things_shard11_replica1
>
>   for shard11!!!
>
> I also got some strange errors on the restarted node.  Makes me wonder if
> there is a string-matching bug for shard1 vs shard11?
>
> SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk
>   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
>   at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
>   at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
>   at
> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
>   at
> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
>   at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
>   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
>   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: There is conflicting
> information about the leader
> of shard: shard1 our state says:
> http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
> ://10.217.55.151:8883/solr/collection1/
>   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)
>
> INFO: Releasing
> directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
> d11_replica1/data/index
> Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
>   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)
>
> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
> recovering for 10.76.31.
> 67:8883_solr but I still do not see the requested state. I see state:
> active live:true
>   at
> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
> .java:948)
>
>
>
>
> On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller wrote:
>
>> Strange - we hardened that loop in 4.1 - so I'm not sure what happened
>> here.
>>
>> Can you do a stack dump on the overseer and see if you see an Overseer
>> thread running perhaps? Or just post the results?
>>
>> To recover, you should be able to just restart the Overseer node and have
>> someone else take over - they should pick up processing the queue.
>>
>> Any logs you might be able to share could be useful too.
>>
>> - Mark
>>
>> On Mar 15, 2013, at 7:51 PM, Gary Yngve  wrote:
>>
>> > Also, looking at overseer_elect, everything looks fine.  node is valid
>> and
>> > live.
>> >
>> >
>> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve 
>> wrote:
>> >
>> >> Sorry, should have specified.  4.1
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller > >wrote:
>> >>
>> >>> What Solr version? 4.0, 4.1 4.2?
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve  wrote:
>> >>>
>> >>>> my solr cloud has been running fine for weeks, but about a week ago,
>> it
>> >>>> stopped dequeueing from the overseer queue, and now there are
>> thousands
>> >>> of
>> >>>> tasks on the queue, most which look like
>> >>>>
>> >>>> {
>> >>>> "operation":"state",
>> >>>> "numShards":null,
>> >>>> "shard":"shard3",
>> >>>> "roles":null,
>> >>>> "state":"recovering",
>> >>>> "core":"production_things_shard3_2",
>> >>>> "collection":"production_things",
>> >>>> "node_name":"10.31.41.59:8883_solr",
>> >>>> "base_url":"http://10.31.41.59:8883/solr"}
>> >>>>
>> >>>> i'm trying to create a new collection through collection API, and
>> >>>> obviously, nothing is happening...
>> >>>>
>> >>>> any suggestion on how to fix this?  drop the queue in zk?
>> >>>>
>> >>>> how could did it have gotten in this state in the first place?
>> >>>>
>> >>>> thanks,
>> >>>> gary
>> >>>
>> >>>
>> >>
>>
>>
>


Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
I restarted the overseer node and another took over, queues are empty now.

the server with core production_things_shard1_2
is having these errors:

shard update error RetryNode:
http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at:
http://10.104.59.189:8883/solr/production_things_shard11_replica1

  for shard11!!!

I also got some strange errors on the restarted node.  Makes me wonder if
there is a string-matching bug for shard1 vs shard11?

SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
  at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
  at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
  at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader
of shard: shard1 our state
says:http://10.104.59.189:8883/solr/collection1/but zookeeper
says:http
://10.217.55.151:8883/solr/collection1/
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)

INFO: Releasing
directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
d11_replica1/data/index
Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)

SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
recovering for 10.76.31.
67:8883_solr but I still do not see the requested state. I see state:
active live:true
  at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
.java:948)




On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller  wrote:

> Strange - we hardened that loop in 4.1 - so I'm not sure what happened
> here.
>
> Can you do a stack dump on the overseer and see if you see an Overseer
> thread running perhaps? Or just post the results?
>
> To recover, you should be able to just restart the Overseer node and have
> someone else take over - they should pick up processing the queue.
>
> Any logs you might be able to share could be useful too.
>
> - Mark
>
> On Mar 15, 2013, at 7:51 PM, Gary Yngve  wrote:
>
> > Also, looking at overseer_elect, everything looks fine.  node is valid
> and
> > live.
> >
> >
> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve 
> wrote:
> >
> >> Sorry, should have specified.  4.1
> >>
> >>
> >>
> >>
> >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller  >wrote:
> >>
> >>> What Solr version? 4.0, 4.1 4.2?
> >>>
> >>> - Mark
> >>>
> >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve  wrote:
> >>>
> >>>> my solr cloud has been running fine for weeks, but about a week ago,
> it
> >>>> stopped dequeueing from the overseer queue, and now there are
> thousands
> >>> of
> >>>> tasks on the queue, most which look like
> >>>>
> >>>> {
> >>>> "operation":"state",
> >>>> "numShards":null,
> >>>> "shard":"shard3",
> >>>> "roles":null,
> >>>> "state":"recovering",
> >>>> "core":"production_things_shard3_2",
> >>>> "collection":"production_things",
> >>>> "node_name":"10.31.41.59:8883_solr",
> >>>> "base_url":"http://10.31.41.59:8883/solr"}
> >>>>
> >>>> i'm trying to create a new collection through collection API, and
> >>>> obviously, nothing is happening...
> >>>>
> >>>> any suggestion on how to fix this?  drop the queue in zk?
> >>>>
> >>>> how could did it have gotten in this state in the first place?
> >>>>
> >>>> thanks,
> >>>> gary
> >>>
> >>>
> >>
>
>


Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
Also, looking at overseer_elect, everything looks fine.  node is valid and
live.


On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve  wrote:

> Sorry, should have specified.  4.1
>
>
>
>
> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller wrote:
>
>> What Solr version? 4.0, 4.1 4.2?
>>
>> - Mark
>>
>> On Mar 15, 2013, at 7:19 PM, Gary Yngve  wrote:
>>
>> > my solr cloud has been running fine for weeks, but about a week ago, it
>> > stopped dequeueing from the overseer queue, and now there are thousands
>> of
>> > tasks on the queue, most which look like
>> >
>> > {
>> >  "operation":"state",
>> >  "numShards":null,
>> >  "shard":"shard3",
>> >  "roles":null,
>> >  "state":"recovering",
>> >  "core":"production_things_shard3_2",
>> >  "collection":"production_things",
>> >  "node_name":"10.31.41.59:8883_solr",
>> >  "base_url":"http://10.31.41.59:8883/solr"}
>> >
>> > i'm trying to create a new collection through collection API, and
>> > obviously, nothing is happening...
>> >
>> > any suggestion on how to fix this?  drop the queue in zk?
>> >
>> > how could did it have gotten in this state in the first place?
>> >
>> > thanks,
>> > gary
>>
>>
>


Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
Sorry, should have specified.  4.1




On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller  wrote:

> What Solr version? 4.0, 4.1 4.2?
>
> - Mark
>
> On Mar 15, 2013, at 7:19 PM, Gary Yngve  wrote:
>
> > my solr cloud has been running fine for weeks, but about a week ago, it
> > stopped dequeueing from the overseer queue, and now there are thousands
> of
> > tasks on the queue, most which look like
> >
> > {
> >  "operation":"state",
> >  "numShards":null,
> >  "shard":"shard3",
> >  "roles":null,
> >  "state":"recovering",
> >  "core":"production_things_shard3_2",
> >  "collection":"production_things",
> >  "node_name":"10.31.41.59:8883_solr",
> >  "base_url":"http://10.31.41.59:8883/solr"}
> >
> > i'm trying to create a new collection through collection API, and
> > obviously, nothing is happening...
> >
> > any suggestion on how to fix this?  drop the queue in zk?
> >
> > how could did it have gotten in this state in the first place?
> >
> > thanks,
> > gary
>
>


Re: How to use shardId

2013-02-20 Thread Gary Yngve
the param in solr.xml should be shard, not shardId.  i tripped over this
too.

-g



On Mon, Jan 14, 2013 at 7:01 AM, starbuck wrote:

> Hi all,
>
> I am trying to realize a solr cloud cluster with 2 collections and 4 shards
> each with 2 replicates hosted by 4 solr instances. If shardNum parm is set
> to 4 and all solr instances are started after each other it seems to work
> fine.
>
> What I wanted to do now is removing shardNum from JAVA_OPTS and defining
> each core with a "shardId". Here is my current solr.xml of the first and
> second (in the second there is another instanceDir, the rest is the same)
> solr instance:
>
>
>
> Here is solr.xml of the third and fourth solr instance:
>
>
>
> But it seems that solr doesn't accept the shardId or omits it. What I
> really
> get is 2 collections each with 2 shards and 8 replicates (each solr
> instance
> 2)
> Either the functionality is not really clear to me or there has to be a
> config failure.
>
> It would very helpful if anyone could give me a hint.
>
> Thanks.
> starbuck
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-use-shardId-tp4033186.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


solr4.1 createNodeSet requires ip addresses?

2013-02-15 Thread Gary Yngve
Hi all,

I've been unable to get the collections create API to work with
createNodeSet containing hostnames, both localhost and external hostnames.
 I've only been able to get it working when using explicit IP addresses.

It looks like zk stores the IP addresses in the clusterstate.json and
live_nodes.  Is it possible that Solr Cloud is not doing any hostname
resolving but just looking for an explicit match with createNodeSet?  This
is kind of annoying, in that I am working with EC2 instances and consider
it pretty lame to need to use elastic IPs for internal use.  I'm hacking
around it now (looking up the eth0 inet addr on each machine), but I'm not
happy about it.

Has anyone else found a better solution?

The reason I want to specify explicit nodes for collections is so I can
have just one zk ensemble managing collections across different
environments that will go up and down independently of each other.

Thanks,
Gary


Re: incorrect solr update behavior

2013-01-14 Thread Gary Yngve
Of course, as soon as I post this, I discover this:

https://issues.apache.org/jira/browse/SOLR-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537900#comment-13538174

i'll give this patch a spin in the morning.

(this is not an example of how to use antecedents :))

-g


On Mon, Jan 14, 2013 at 6:27 PM, Gary Yngve  wrote:

> Posting this
>
>  update="set">blah update="add">qux update="add">quuxfoo
>
> to an existing doc with foo and bar tags
> results in tags_ss containing
> 
> {add=qux}
> {add=quux}
> 
>
> whereas posting this
>
>  update="set">blah update="add">quxfoo
>
> results in the expected behavior:
> 
> foo
> bar
> qux
> 
>
> Any ideas?
>
> Thanks,
> Gary
>


RE: dih groovy script question

2012-09-21 Thread Moore, Gary
Looks like some sort of foul-up with Groovy versions and Solr 3.6.1 as  I had 
to roll back to Groovy 1.7.10 to get this to work.   Started with Groovy 2 and 
then 1.8 before 1.7.10.   What's odd is that I implemented the same calls made 
in ScriptTransformer.java in a test program and they worked fine with all 
Groovy versions.  Can't imagine what the root cause might be -- Groovy 
implements jsr223 differently in later versions?  I suppose to find out I could 
compile Solr with my jdk but  time to march on. ;)
Gary

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, September 15, 2012 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: dih groovy script question

Stab in the dark... This looks like you're somehow getting the wrong Groovy 
jars. Can you print out the Groovy version as a test? Perhaps you have one 
groovy version in your command-line and copied a different version into the 
libraries Solr knows about?

Because this looks like a pure Groovy error

Best
Erick

On Thu, Sep 13, 2012 at 9:03 PM, Moore, Gary  wrote:
> I'm a bit stumped as to why I can't get a groovy script to run from the DIH.  
>  I'm sure it's something braindead I'm missing.   The script looks like this 
> in data-config.xml:
>
> <![CDATA[
> import java.security.MessageDigest
> import java.util.HashMap
> def createHashId(HashMap<String,Object>row, 
> org.apache.solr.handler.dataimport.ContextImpl context )  {
>   // do groovy stuff
> return row } ]]> 
>
> When I run the import, I get the following error:
>
>
> Caused by: java.lang.NoSuchMethodException: No signature of method: 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is 
> applicable for argument types: (java.util.HashMap, 
> org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, 
> Credits:, EnteredBy:Corey Holland, ...], ...]
> at 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364)
> at 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160)
> ... 13 more
>
> The script runs fine from the shell so I don't believe there are any groovy 
> errors.  Thanks in advance for any tips.
> Gary
>
>
>
>
> This electronic message contains information generated by the USDA solely for 
> the intended recipients. Any unauthorized interception of this message or the 
> use or disclosure of the information it contains may violate the law and 
> subject the violator to civil or criminal penalties. If you believe you have 
> received this message in error, please notify the sender and delete the email 
> immediately.




dih groovy script question

2012-09-13 Thread Moore, Gary
I'm a bit stumped as to why I can't get a groovy script to run from the DIH.   
I'm sure it's something braindead I'm missing.   The script looks like this in 
data-config.xml:

<![CDATA[
import java.security.MessageDigest
import java.util.HashMap
def createHashId(HashMap<String,Object>row, 
org.apache.solr.handler.dataimport.ContextImpl context )  {
  // do groovy stuff
return row
}
]]>


When I run the import, I get the following error:


Caused by: java.lang.NoSuchMethodException: No signature of method: 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is applicable 
for argument types: (java.util.HashMap, 
org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, 
Credits:, EnteredBy:Corey Holland, ...], ...]
at 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364)
at 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160)
... 13 more

The script runs fine from the shell so I don't believe there are any groovy 
errors.  Thanks in advance for any tips.
Gary




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.


Payloads slowing down add/delete doc

2012-03-02 Thread Gary Yang
Hi, there

In order to keep a DocID vs UID map, we added payload to a solr core. The 
search on UID is very fast but we get a problem with adding/deleting docs.  
Every time we commit an adding/deleting action, solr/lucene will take up to 30 
seconds to complete.  Without payload, a same action can be done in 
milliseconds.

We do need real time commit.

Here is the payload definition:





  


   


Any suggestions?

Any help is appreciated.

Best Regards

G. Y.


DIH doesn't handle bound namespaces?

2011-10-31 Thread Moore, Gary
I'm trying to import some MODS XML using DIH.  The XML uses bound namespacing:

http://www.w3.org/2001/XMLSchema-instance";
  xmlns:mods="http://www.loc.gov/mods/v3";
  xmlns:xlink="http://www.w3.org/1999/xlink";
  xmlns="http://www.loc.gov/mods/v3";
  xsi:schemaLocation="http://www.loc.gov/mods/v3 
http://www.loc.gov/mods/v3/mods-3-4.xsd";
  version="3.4">
   
  Malus domestica: Arnold
   


However, XPathEntityProcessor doesn't seem to handle xpaths of the type 
xpath="//mods:titleInfo/mods:title".

If I remove the namespaces from the source XML:

http://www.w3.org/2001/XMLSchema-instance";
  xmlns:mods="http://www.loc.gov/mods/v3";
  xmlns:xlink="http://www.w3.org/1999/xlink";
  xmlns="http://www.loc.gov/mods/v3";
  xsi:schemaLocation="http://www.loc.gov/mods/v3 
http://www.loc.gov/mods/v3/mods-3-4.xsd";
  version="3.4">
   
  Malus domestica: Arnold
   


then xpath="//titleInfo/title" works just fine.  Can anyone confirm that this 
is the case and, if so, recommend a solution?
Thanks
Gary


Gary Moore
Technical Lead
LCA Digital Commons Project
NAL/ARS/USDA



Re: query for point in time

2011-09-15 Thread gary tam
Thanks for the reply.  We had the search within the database initially, but
it proven to be too slow.  With solr we have much better performance.

One more question, how could I find the most current job for each employee

My data looks like


John Smith  department A   web site bug fix   2010-01-01
2010-01-03
 unit testing
 2010-01-04   2010-01-06
 QA support
2010-01-07   2010-01-12
 implementation   2010-01-13
   2010-01-22

Jane Doe  department A  QA support 2010-01-01
2010-05-01
 implementation   2010-05-02
   2010-09-28

Joe Doe  department APHP development  2011-01-01
2011-08-31
 Java Development  2011-09-01
2011-09-15

I would like to return this as my search result

John Smith   department Aimplementation  2010-01-13
  2010-01-22
Jane Doe  department Aimplementation  2010-05-02
  2010-09-28
Joe Doedepartment AJava Development  2011-09-01
  2011-09-15


Thanks in advance
Gary



On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind  wrote:

> You didn't tell us what your schema looks like, what fields with what types
> are involved.
>
> But similar to how you'd do it in your database, you need to find
> 'documents' that have a start date before your date in question, and an end
> date after your date in question, to find the ones whose range includes your
> date in question.
>
> Something like this:
>
> q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *]
>
> Of course, you need to add on your restriction to just documents about
> 'John Smith', through another AND clause or an 'fq'.
>
> But in general, if you've got a db with this info already, and this is all
> you need, why not just use the db?  Multi-hieararchy data like this is going
> to give you trouble in Solr eventually, you've got to arrange the solr
> indexes/schema to answer your questions, and eventually you're going to have
> two questions which require mutually incompatible schema to answer.
>
> An rdbms is a great general purpose question answering tool for structured
> data.  lucene/Solr is a great indexing tool for text matching.
>
>
> On 9/15/2011 2:55 PM, gary tam wrote:
>
>> Hi
>>
>> I have a scenario that I am not sure how to write the query for.
>>
>> Here is the scenario - have an employee record with multi value for
>> project,
>> started date, end date.
>>
>> looks something like
>>
>>
>> John Smith web site bug fix   2010-01-01   2010-01-03
>>  unit testing  2010-01-04
>> 2010-01-06
>>  QA support 2010-01-07
>> 2010-01-12
>>  implementation   2010-01-13
>>  2010-01-22
>>
>> I want to find what project John Smith was working on 2010-01-05
>>
>> Is this possible or I have to back to my database ?
>>
>>
>> Thanks
>>
>>


query for point in time

2011-09-15 Thread gary tam
Hi

I have a scenario that I am not sure how to write the query for.

Here is the scenario - have an employee record with multi value for project,
started date, end date.

looks something like


John Smith web site bug fix   2010-01-01   2010-01-03
 unit testing  2010-01-04
2010-01-06
 QA support 2010-01-07
2010-01-12
 implementation   2010-01-13
 2010-01-22

I want to find what project John Smith was working on 2010-01-05

Is this possible or I have to back to my database ?


Thanks


RE: how to run solr in apache server?

2011-09-07 Thread Moore, Gary
Solr only runs in a container.  To make it appear as if Solr is "running" on 
httpd,  Google 'httpd tomcat' for instructions on how to front tomcat with 
httpd mod_jk or mod_proxy.  Our system admins prefer mod_proxy.  Not sure why 
you'd need to front Solr with httpd since it's usually an application backend, 
e.g. a PHP application running on port 80 connects to Solr on port 8983.
Gary

-Original Message-
From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] 
Sent: Wednesday, September 07, 2011 7:41 AM
To: solr-user@lucene.apache.org
Subject: how to run solr in apache server?

Hi everybody...
 can anybody tell me how to run solr on Apache server(not apache
tomcat)


Thnax in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-run-solr-in-apache-server-tp3316377p3316377.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: commas in synonyms.txt are not escaping

2011-08-29 Thread Moore, Gary
Hah, I knew it was something simple. :)  Thanks.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, August 28, 2011 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Turns out this isn't a bug - I was just tripped up by the analysis
changes to the example server.

Gary, you are probably just hitting the same thing.
The "text" fieldType is no longer used by any fields by default - for
example the "text" field uses the "text_general" fieldType.
This fieldType uses the standard tokenizer, which discards stuff like
commas (hence the synonym will never match).

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Alexi,
Yes but no difference.  This is apparently an issue introduced in 3.*.  Thanks 
for your help.
-Gary

-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary 

> Here you go -- I'm just hacking the text field at the moment.  Thanks,
> Gary
>
> 
>  
>
>  synonyms="index_synonyms.txt"
> tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
>   
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Friday, August 26, 2011 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: commas in synonyms.txt are not escaping
>
> Gary, please post the entire field declaration so I can try to reproduce
> here
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Thanks, Yonik.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, August 26, 2011 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
 wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>>
>> I have a number of chemical names containing commas which I'm mapping in 
>> index_synonyms.txt thusly:
>>
>> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
>> 3,CCRIS 8562
>>
>> According to the sample synonyms.txt, the comma above should be. i.e. 
>> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
>> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
>> paste in 2\,4-D-butotyl, the mappings are done.
>
>
> I can confirm that this works in 1.4, but no longer works in 3x or
> trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Here you go -- I'm just hacking the text field at the moment.  Thanks,
Gary


  








  
  

   





  


-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, please post the entire field declaration so I can try to reproduce
here




commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary

I have a number of chemical names containing commas which I'm mapping in 
index_synonyms.txt thusly:

2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
3,CCRIS 8562

According to the sample synonyms.txt, the comma above should be. i.e. 
a\,a=>b\,b.The problem is that according to analysis.jsp the commas are not 
being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I paste in 
2\,4-D-butotyl, the mappings are done.  This is verified by there being no 
mappings in the index.  I assume there would be if 2\,4-D-butotyl actually 
appeared in a document.

The filter I'm declaring in the index analyzer looks like this:



Doesn't seem to matter which tokenizer I use.This must be something simple 
that I'm not doing but am a bit stumped at the moment and would appreciate any 
tips.
Thanks
Gary




Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor

Naveen,

Not sure our requirement matches yours, but one of the things we index 
is a "comment" item that can have one or more files attached to it.  To 
index the whole thing as a single Solr document we create a zipfile 
containing a file with the comment details in it and any additional 
attached files.  This is submitted to Solr as a TEXT field in an XML 
doc, along with other meta-data fields from the comment.  In our schema 
the TEXT field is indexed but not stored, so when we search and get a 
match back it doesn't contain all of the contents from the attached 
files etc., only the stored fields in our schema.   Admittedly, the user 
can therefore get back a "comment" match with no indication as to WHERE 
the match occurred (ie. was it in the meta-data or the contents of the 
attached files), but at the moment we're only interested in getting 
appropriate matches, not explaining where the match is.


Hope that helps.

Kind regards,
Gary.



On 09/06/2011 03:00, Naveen Gupta wrote:

Hi Gary

It started working .. though i did not test for Zip files, but for rar
files, it is working fine ..

only thing what i wanted to do is to index the metadata (text mapped to
content) not store the data  Also in search result, i want to filter the
stuffs ... and it started working fine .. i don't want to show the content
stuffs to the end user, since the way it extracts the information is not
very helpful to the user .. although we can apply few of the analyzers and
filters to remove the unnecessary tags ..still the information would not be
of much help .. looking for your opinion ... what you did in order to filter
out the content or are you showing the content extracted to the end user?

Even in case, we are showing the text part to the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...

Thanks
Naveen

On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor  wrote:


Naveen,

For indexing Zip files with Tika, take a look at the following thread :


http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

I got it to work with the 3.1 source and a couple of patches.

Hope this helps.

Regards,
Gary.



On 08/06/2011 04:12, Naveen Gupta wrote:


Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.





Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor

Naveen,

For indexing Zip files with Tika, take a look at the following thread :

http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

I got it to work with the 3.1 source and a couple of patches.

Hope this helps.

Regards,
Gary.


On 08/06/2011 04:12, Naveen Gupta wrote:

Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.




Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-23 Thread Gary Taylor

Jayendra,

I cleared out my local repository, and replayed all of my steps from 
Friday and it now it works.  The only difference (or the only one that's 
obvious to me) was that I applied the patch before doing a full 
compile/test/dist.  But I assumed that given I was seeing my new log 
entries (from ExtractingDocumentLoader.java) I was running the correct 
code anyway.


However, I'm very pleased that it's working now - I get the full 
contents of the zipped files indexed and not just the file names.


Thank you again for your assistance, and the patch!

Kind regards,
Gary.


On 21/05/2011 03:12, Jayendra Patil wrote:

Hi Gary,

I tried the patch on the the 3.1 source code (@
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/)
as well and it worked fine.
@Patch - https://issues.apache.org/jira/browse/SOLR-2416, which deals
with the Solr Cell module.

You may want to verify the contents from the results by enabling the
stored attribute on the text field.

e.g. URL curl 
"http://localhost:8983/solr/update/extract?stream.file=C:/Test.zip&literal.id=777045&literal.title=Test&commit=true";

Let me know if it works. I would be happy to share the generated
artifact you can test on.

Regards,
Jayendra




Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-20 Thread Gary Taylor
Hello again.  Unfortunately, I'm still getting nowhere with this.  I 
have checked-out the 3.1 source and applied Jayendra's patches (see 
below) and it still appears that the contents of the files in the 
zipfile are not being indexed, only the filenames of those contained files.


I'm using a simple CURL invocation to test this:

curl 
"http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; 
-F "commit=true" -F "file=@solr1.zip"


solr1.zip contains two simple txt files (doc1.txt and doc2.txt).  I'm 
expecting the contents of those txt files to be extracted from the zip 
and indexed, but this isn't happening - or at least, I don't get the 
desired result when I do a query afterwards.  I do get a match if I 
search for either "doc1.txt" or "doc2.txt", but not if I search for a 
word that appears in their contents.


If I index one of the txt files (instead of the zipfile), I can query 
the content OK, so I'm assuming my query is sensible and matches the 
field specified on the CURL string (ie. "text").  I'm also happy that 
the Solr Cell content extraction is working because I can successfully 
index PDF, Word, etc. files.


In a fit of desperation I have added log.info statements into the files 
referenced by Jayendra's patches (SOLR-2416 and SOLR-2332) and I see 
those in the log when I submit the zipfile with CURL, so I know I'm 
running those patched files in the build.


If anyone can shed any light on what's happening here, I'd be very grateful.

Thanks and kind regards,
Gary.


On 11/04/2011 11:12, Gary Taylor wrote:

Jayendra,

Thanks for the info - been keeping an eye on this list in case this 
topic cropped up again.  It's currently a background task for me, so 
I'll try and take a look at the patches and re-test soon.


Joey - glad you brought this issue up again.  I haven't progressed any 
further with it.  I've not yet moved to Solr 3.1 but it's on my to-do 
list, as is testing out the patches referenced by Jayendra.  I'll post 
my findings on this thread - if you manage to test the patches before 
me, let me know how you get on.


Thanks and kind regards,
Gary.


On 11/04/2011 05:02, Jayendra Patil wrote:

The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.

I was able to get this working again with the following patches. (Solr
Cell and Data Import handler)

https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332

You can try these.

Regards,
Jayendra

On Sun, Apr 10, 2011 at 10:35 PM, Joey 
Hanzel  wrote:

Hi Gary,

I have been experiencing the same problem... Unable to extract 
content from
archive file formats.  I just tried again with a clean install of 
Solr 3.1.0
(using Tika 0.8) and continue to experience the same results.  Did 
you have

any success with this problem with Solr 1.4.1 or 3.1.0 ?

I'm using this curl command to send data to Solr.
curl "
http://localhost:8080/solr/update/extract?literal.id=doc1&fmap.content=attr_content&commit=true"; 


-H "application/octet-stream" -F  "myfile=@data.zip"

No problem extracting single rich text documents, but archive files 
only

result in the file names within the archive being indexed. Am I missing
something else in my configuration? Solr doesn't seem to be 
unpacking the
archive files. Based on the email chain associated with your first 
message,
some people have been able to get this functionality to work as 
desired.










--
Gary Taylor
INOVEM

Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.tay...@inovem.com
www.inovem.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE



Seattle Solr/Lucene User Group?

2011-04-13 Thread Gary Yngve
Hi all,

Does anyone know if there is a Solr/Lucene user group /
birds-of-feather that meets in Seattle?

If not, I'd like to start one up.  I'd love to learn and share tricks
pertaining to NRT, performance, distributed solr, etc.

Also, I am planning on attending the Lucene Revolution!

Let's connect!

-Gary

http://www.linkedin.com/in/garyyngve


Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Gary Taylor

Jayendra,

Thanks for the info - been keeping an eye on this list in case this 
topic cropped up again.  It's currently a background task for me, so 
I'll try and take a look at the patches and re-test soon.


Joey - glad you brought this issue up again.  I haven't progressed any 
further with it.  I've not yet moved to Solr 3.1 but it's on my to-do 
list, as is testing out the patches referenced by Jayendra.  I'll post 
my findings on this thread - if you manage to test the patches before 
me, let me know how you get on.


Thanks and kind regards,
Gary.


On 11/04/2011 05:02, Jayendra Patil wrote:

The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.

I was able to get this working again with the following patches. (Solr
Cell and Data Import handler)

https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332

You can try these.

Regards,
Jayendra

On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzel  wrote:

Hi Gary,

I have been experiencing the same problem... Unable to extract content from
archive file formats.  I just tried again with a clean install of Solr 3.1.0
(using Tika 0.8) and continue to experience the same results.  Did you have
any success with this problem with Solr 1.4.1 or 3.1.0 ?

I'm using this curl command to send data to Solr.
curl "
http://localhost:8080/solr/update/extract?literal.id=doc1&fmap.content=attr_content&commit=true";
-H "application/octet-stream" -F  "myfile=@data.zip"

No problem extracting single rich text documents, but archive files only
result in the file names within the archive being indexed. Am I missing
something else in my configuration? Solr doesn't seem to be unpacking the
archive files. Based on the email chain associated with your first message,
some people have been able to get this functionality to work as desired.






--
Gary Taylor
INOVEM

Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.tay...@inovem.com
www.inovem.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE



Re: adding a document using curl

2011-03-03 Thread Gary Taylor

As an example, I run this in the same directory as the msword1.doc file:

curl 
"http://localhost:8983/solr/core0/update/extract?literal.docid=74&literal.type=5"; 
-F "file=@msword1.doc"


The "type" literal is just part of my schema.

Gary.


On 03/03/2011 11:45, Ken Foskey wrote:

On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote:

Here's a complete example
http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL

I should have been clearer.   A rich text document,  XML I can make work
and a script is in the example docs folder

http://wiki.apache.org/solr/ExtractingRequestHandler

I also read the solr 1.4 book and tried samples in there,   could not
make them work.

Ta






Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-31 Thread Gary Taylor
Can anyone shed any light on this, and whether it could be a config 
issue?  I'm now using the latest SVN trunk, which includes the Tika 0.8 
jars.


When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) 
to the ExtractingRequestHandler, I get the following log entry 
(formatted for ease of reading) :


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, 
application/octet-stream, stream_size, 260, stream_name, solr1.zip, 
Content-Type, application/zip]

},
ignored_=ignored_(1.0)={
[package-entry, package-entry]
},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream}, 


ignored_stream_size=ignored_stream_size(1.0)={260},
ignored_stream_name=ignored_stream_name(1.0)={solr1.zip},
ignored_content_type=ignored_content_type(1.0)={application/zip},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={  doc2.txtdoc1.txt}
}
]

So, the data coming back from Tika when parsing a ZIP file does not 
include the file contents, only the names of the files contained 
therein.  I've tried forcing stream.type=application/zip in the CURL 
string, but that makes no difference.  If I specify an invalid 
stream.type then I get an exception response, so I know it's being used.


When I send one of those txt files individually to the 
ExtractingRequestHandler, I get:


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, text/plain, 
stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt]

},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain},

ignored_stream_size=ignored_stream_size(1.0)={30},
ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1},
ignored_stream_name=ignored_stream_name(1.0)={doc1.txt},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={The quick brown fox  }
}
]

and we see the file contents in the "text" field.

I'm using the following requestHandler definition in solrconfig.xml:


class="org.apache.solr.handler.extraction.ExtractingRequestHandler" 
startup="lazy">



text
true
ignored_


true
links
ignored_



Is there any further debug or diagnostic I can get out of Tika to help 
me work out why it's only returning the file names and not the file 
contents when parsing a ZIP file?


Thanks and kind regards,
Gary.



On 25/01/2011 16:48, Jayendra Patil wrote:

Hi Gary,

The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.

Tested again with sample url and works fine -
curl "
http://localhost:8080/solr/core0/update/extract?stream.file=C:/temp/extract/777045.zip&literal.id=777045&literal.title=Test&commit=true
"

You would probably need to drill down to the Tika Jars and
the apache-solr-cell-4.0-dev.jar used for Rich documents indexing.

Regards,
Jayendra





Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

OK, got past the schema.xml problem, but now I'm back to square one.

I can index the contents of binary files (Word, PDF etc...), as well as 
text files, but it won't index the content of files inside a zip.


As an example, I have two txt files - doc1.txt and doc2.txt.  If I index 
either of them individually using:


curl 
"http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; 
-F "file=@doc1.txt"


and commit, Solr will index the contents and searches will match.

If I zip those two files up into solr1.zip, and index that using:

curl 
"http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; 
-F "file=@solr1.zip"


and commit, the file names are indexed, but not their contents.

I have checked that Tika can correctly process the zip file when used 
standalone with the tika-app jar - it outputs both the filenames and 
contents.  Should I be able to index the contents of files stored in a 
zip by using extract ?


Thanks and kind regards,
Gary.


On 25/01/2011 15:32, Gary Taylor wrote:

Thanks Erlend.

Not used SVN before, but have managed to download and build latest 
trunk code.


Now I'm getting an error when trying to access the admin page (via 
Jetty) because I specify HTMLStripStandardTokenizerFactory in my 
schema.xml, but this appears to be no-longer supplied as part of the 
build so I get an exception cos it can't find that class.  I've 
checked the CHANGES.txt and found the following in the change list to 
1.4.0 (!?) :


66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, 
HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, 
HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji)


Unfortunately, I can't seem to get that to work correctly.  Does 
anyone have an example fieldType stanza (for schema.xml) for stripping 
out HTML ?


Thanks and kind regards,
Gary.



On 25/01/2011 14:17, Erlend Garåsen wrote:

On 25.01.11 11.30, Erlend Garåsen wrote:


Tika version 0.8 is not included in the latest release/trunk from SVN.


Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry.

And to clarify, by "content" I mean the main content of a Word file. 
Title and other kinds of metadata are successfully extracted by the 
old 0.4 version of Tika, but you need a newer Tika version (0.8) in 
order to fetch the main content as well. So try the newest Solr 
version from trunk.


Erlend








Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

Thanks Erlend.

Not used SVN before, but have managed to download and build latest trunk 
code.


Now I'm getting an error when trying to access the admin page (via 
Jetty) because I specify HTMLStripStandardTokenizerFactory in my 
schema.xml, but this appears to be no-longer supplied as part of the 
build so I get an exception cos it can't find that class.  I've checked 
the CHANGES.txt and found the following in the change list to 1.4.0 (!?) :


66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, 
HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, 
HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji)


Unfortunately, I can't seem to get that to work correctly.  Does anyone 
have an example fieldType stanza (for schema.xml) for stripping out HTML ?


Thanks and kind regards,
Gary.



On 25/01/2011 14:17, Erlend Garåsen wrote:

On 25.01.11 11.30, Erlend Garåsen wrote:


Tika version 0.8 is not included in the latest release/trunk from SVN.


Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry.

And to clarify, by "content" I mean the main content of a Word file. 
Title and other kinds of metadata are successfully extracted by the 
old 0.4 version of Tika, but you need a newer Tika version (0.8) in 
order to fetch the main content as well. So try the newest Solr 
version from trunk.


Erlend






Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

Hi,

I posted a question in November last year about indexing content from 
multiple binary files into a single Solr document and Jayendra responded 
with a simple solution to zip them up and send that single file to Solr.


I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't 
currently allow this to work and only the file names of the zipped files 
are indexed (and not their contents).


I've tried downloading and building the latest Tika (0.8) and replacing 
the tika-parsers and tika-core JARS in 
\contrib\extraction\lib but this still isn't indexing the 
file contents, and not doesn't even index the file names!


Is there a version of Tika that works with the Solr 1.4.1 released 
distribution which does index the contents of the zipped files?


Thanks and kind regards,
Gary



Re: example schema in branch_3x returns SEVERE errors

2010-11-27 Thread Gary Yngve
Sorry, false alarm.  Had a bad merge and had a stray library linking to an
older version of another library.  Works now.

-Gary


On Sat, Nov 27, 2010 at 4:17 PM, Gary Yngve  wrote:

> logs> grep SEVERE solr.err.log
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.KeywordMarkerFilterFactory'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.KeywordMarkerFilterFactory'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.KeywordMarkerFilterFactory'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.EnglishMinimalStemFilterFactory'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.PointType'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.LatLonType'
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'solr.GeoHashField'
> SEVERE: java.lang.RuntimeException: schema fieldtype
> text(org.apache.solr.schema.TextField) invalid
> arguments:{autoGeneratePhraseQueries=true}
> SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location'
> specified on field store
>
> It looks like it's loading the correct files...
>
> 010-11-27 13:01:28.005:INFO::Logging to STDERR via
> org.mortbay.log.StdErrLog
> 2010-11-27 13:01:28.137:INFO::jetty-6.1.22
> 2010-11-27 13:01:28.204:INFO::Extract
> file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war
> to
> /Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp
>
> And on inspection on the war and the solr-core jar inside, I can see the
> missing classes, so I am pretty confused.
>
> Has anyone else seen this before or have an idea on how to surmount it?
>
> I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user
> error.
>
> Thanks,
> Gary
>


example schema in branch_3x returns SEVERE errors

2010-11-27 Thread Gary Yngve
logs> grep SEVERE solr.err.log
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.EnglishMinimalStemFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.PointType'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.LatLonType'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.GeoHashField'
SEVERE: java.lang.RuntimeException: schema fieldtype
text(org.apache.solr.schema.TextField) invalid
arguments:{autoGeneratePhraseQueries=true}
SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location'
specified on field store

It looks like it's loading the correct files...

010-11-27 13:01:28.005:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2010-11-27 13:01:28.137:INFO::jetty-6.1.22
2010-11-27 13:01:28.204:INFO::Extract
file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war
to
/Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp

And on inspection on the war and the solr-core jar inside, I can see the
missing classes, so I am pretty confused.

Has anyone else seen this before or have an idea on how to surmount it?

I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user
error.

Thanks,
Gary


Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Jayendra,

Brilliant! A very simple solution. Thank you for your help.

Kind regards,
Gary


On 17 Nov 2010 22:09, Jayendra Patil <jayendra.patil@gmail.com> 
wrote: 

The way we implemented the same scenario is zipping all the attachments into

a single zip file which can be passed to the ExtractingRequestHandler for

indexing and included as a part of single Solr document.



Regards,

Jayendra



On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor <g...@inovem.com> wrote:



> Hi,

>

> We're trying to use Solr to replace a custom Lucene server.  One

> requirement we have is to be able to index the content of multiple binary

> files into a single Solr document.  For example, a uniquely named object in

> our app can have multiple attached-files (eg. Word, PDF etc.), and we want

> to index (but not store) the contents of those files in the single Solr doc

> for that named object.

>

> At the moment, we're issuing HTTP requests direct from ColdFusion and using

> the /update/extract servlet, but can only specify a single file on each

> request.

>

> Is the best way to achieve this to extend ExtractingRequestHandler to allow

> multiple binary files and thus specify our own RequestHandler, or would

> using the SolrJ interface directly be a better bet, or am I missing

> something fundamental?

>

> Thanks and regards,

> Gary.

>



Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor

Hi,

We're trying to use Solr to replace a custom Lucene server.  One 
requirement we have is to be able to index the content of multiple 
binary files into a single Solr document.  For example, a uniquely named 
object in our app can have multiple attached-files (eg. Word, PDF etc.), 
and we want to index (but not store) the contents of those files in the 
single Solr doc for that named object.


At the moment, we're issuing HTTP requests direct from ColdFusion and 
using the /update/extract servlet, but can only specify a single file on 
each request.


Is the best way to achieve this to extend ExtractingRequestHandler to 
allow multiple binary files and thus specify our own RequestHandler, or 
would using the SolrJ interface directly be a better bet, or am I 
missing something fundamental?


Thanks and regards,
Gary.


Re: synonyms not working with copyfield

2010-05-13 Thread Gary
Hi Surajit
I aint sure if this is any help, but I had a similar problem but with stop 
words, they were not working with dismax queries. Well to cut a long story it 
seems that all the querying fields need to be configured with stopwords.

Maybe this has the similar affect with Synonyms confguration, thus your 
copyField should be defined as a type that is configured with the 
SynonymFilterFactory, just like 
"person_name".

You can find some guidance here:

http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

Gary





Re: Strange NPE with SOLR-236 (Field collapsing)

2010-05-12 Thread Gary
Hi Eric
 
I catch the NPE in the NonAdjacentDocumentCollapser class and now  it does
return the data field collapsed. 

However I can not promise how accurate or correct this fix is becuase I have not
got allot of time to study all the code.

It would be best if some of the experts could give us a clue 

I made the change in
solr src java org apache solr search fieldcollapse
NonAdjacentDocumentCollapser.java, inner class FloatValueFieldComparator. 




Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread gary


http://www.webtide.com/choose/jetty.jsp

>> > - Original Message -
>> > From: "Steve Radhouani" 
>> > To: solr-user@lucene.apache.org
>> > Sent: Tuesday, 16 February, 2010 12:38:04 PM
>> > Subject: Tomcat vs Jetty: A Comparative Analysis?
>> >
>> > Hi there,
>> >
>> > Is there any analysis out there that may help to choose between Tomcat
>> and
>> > Jetty to deploy Solr? I wonder wether there's a significant difference
>> > between them in terms of performance.
>> >
>> > Any advice would be much appreciated,
>> > -Steve
>> >
>>


Tomcat6 env-entry

2007-12-04 Thread Gary Harris
It works excellently in Tomcat 6. The toughest thing I had to deal with is 
discovering that the environment variable in web.xml for solr/home is 
essential. If you skip that step, it won't come up.


   
   solr/home
   java.lang.String
   F:\Tomcat-6.0.14\webapps\solr
   

- Original Message - 
From: "Charlie Jackson" <[EMAIL PROTECTED]>

To: 
Sent: Monday, December 03, 2007 11:35 AM
Subject: RE: Tomcat6?


$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can 
create it and it will work exactly the same way it did in Tomcat 5. It's not 
created by default because its not needed by the manager webapp anymore.



-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED]
Sent: Monday, December 03, 2007 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Tomcat6?

In context.xml, I added..



I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:


In the Solr wiki, there is not described how to install Solr on
Tomcat 6, and I not managed it myself :(
In the chapter "Configuring Solr Home with JNDI" there is mentioned
the directory $CATALINA_HOME/conf/Catalina/localhost , which not
exists with TOMCAT 6.

Alternatively I tried the folder $CATALINA_HOME/work/Catalina/
localhost, but with no success.. (I can query the top level page,
but the "Solr Admin" link then not works).

Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_




--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.16.12/1162 - Release Date: 11/30/2007 
9:26 PM





TEI indexing

2007-05-21 Thread Gary Browne
Once again, thanks for your help getting Solr up and running.

 

I'm wondering if anyone has any hints on how to prepare TEI documents
for indexing - I was about to write some XSLT but didn't want to
reinvent the wheel (unless it's punctured)?

 

Regards

Gary

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 



RE: Null pointer exception

2007-05-14 Thread Gary Browne
Hi Chris

The /var/www/html/solr/data/ directory did exist. I tried opening up
permissions completely for testing but no luck (the tomcat user had
write permissions).

I decided to trash the whole installation and start again. I downloaded
last nights build and untarred it. Put the .war into
$TOMCAT_HOME/webapps. Copied the example/solr directory as
/var/www/html/solr. No JNDI file this time, just updated solrconfig to
read /var/www/html/solr as my data.dir.

I can access the admin page but when I try an index action from the
commandline, or a search from the admin page, I get something like:

"The requested resource (/solr/select/) is not available"

I have other apps running under tomcat okay, seems like it can't find
the lib .jars or can't access the classes within them?

Stuck...

Cheers
Gary



Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 15 May 2007 9:51 AM
To: solr-user@lucene.apache.org
Subject: RE: Null pointer exception

: I am running v1.1.0. If I do a search (from the admin page), it throws
: the following exception:
:
: java.lang.RuntimeException: java.io.IOException:
: /var/www/html/solr/data/index not a directory

does /var/www/html/solr/data/ exist? ... if so does the effective userID
for tomcat have permission to write to it?  if not does the effective
userID for tomcat have permission to write to /var/www/html/solr/ ?



-Hoss



RE: Null pointer exception

2007-05-14 Thread Gary Browne
Thanks a lot for your reply Chris

I am running v1.1.0. If I do a search (from the admin page), it throws
the following exception:

java.lang.RuntimeException: java.io.IOException:
/var/www/html/solr/data/index not a directory

There are no exceptions on starting Tomcat, only one warning regarding
JMS client lib not found (related to Cocoon). I have named a file
solr.xml in my $TOMCAT_HOME/conf/Catalina/localhost directory containing
the following:





I am using the example configs (unmodified).

Thanks again
Gary


Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 15 May 2007 7:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Null pointer exception

: I have tried indexing from the exampledocs which is just sitting in my
: user home directory but now I get a null pointer exception after
: running:

just to clarify: are you using solr 1.1 or a nightly build? did you
check
the log file to ensure thatthere are no exceptions when you start
tomcat?
are you using the example solrconfig.xml and schema.xml?  have you tried
doing a search first without indexing any docs to see if that executs
and
(correctly) returns 0 docs?

If i had to guess, i'd speculate that you aren't correctly using a
system
prop or JNDI to point Solr at your solr home dir, so it's not finding
the
configs; either that, or you've modified the configs and there is a
syntax error -- either way there should be an exception when the server
starts up, well before you update any docs.


-Hoss



Null pointer exception

2007-05-13 Thread Gary Browne
Hi All

 

Thanks very much for your help with indexing setup.

 

I should elucidate my directory/file setup just to check that I have
everything in the right place.

 

I have running under $TOMCAT_HOME/webapps the solr directory containing
admin, WEB-INF and META-INF directories.

 

Under my web root I have the solr directory containing the bin, conf and
data directories.

 

I have tried indexing from the exampledocs which is just sitting in my
user home directory but now I get a null pointer exception after
running:

 

./post.sh solr.xml

 

Can anyone offer advice on this please? (I've attached the trace for
reference)

 

Thanks again

Gary

 

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 

May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.NullPointerException
at org.apache.solr.core.SolrCore.update(SolrCore.java:716)
at 
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)

May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.lang.NullPointerException
at org.apache.solr.core.SolrCore.update(SolrCore.java:763)
at 
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)


Still having indexing problems

2007-05-11 Thread Gary Browne
Hello

 

 

I have tried indexing the example files using the Jetty method, rather
than Tomcat, which still didn't work. I would prefer to use my Tomcat
URL.

 

After starting jettty, I issued

 

Java -jar post.jar http://localhost:8983/solr/update solr.xml
monitor.xml

 

as in the examples on the tutorial, but post.jar cannot be found...

 

Where is it? Is there a path variable I need to set up somewhere?

 

 

Any help greatly appreciated.

 

 

Regards,

 

Gary

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 



New user - indexing problems

2007-05-10 Thread Gary Browne
Hi

 

I'll probably be posting a bunch of stupid questions in the near future,
so bear with me. I'm finding the documentation a little confusing. For
starters, I've got Solr up and running under Tomcat on port 8080, and I
can pull up the admin page, no problems. I'm running on RHEL AS 4, with
curl installed.

 

I'm not sure how to get indexing started - I tried the following:

 

./post.sh http://localhost:8080/solr/update solr.xml monitor.xml (from
exampledocs directory)

 

 and received this error message::

 

The specified HTTP method is not allowed for the requested resource
(HTTP method GET is not supported by this URL).

 

Any help with this would be much appreciated.

 

Regards

Gary

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946