Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-26 Thread Shawn Heisey

On 3/26/2012 10:25 PM, Shawn Heisey wrote:
The problem is that I currently have no way (that I know of so far) to 
detect that a problem happened.  As far as my code is concerned, 
everything worked, so it updates my position tracking and those 
documents will never be inserted.  I have not yet delved into the 
response object to see whether it can tell me anything.  My code 
currently assumes that if no exception was thrown, it was successful.  
This works with CHSS.  I will write some test code that tries out 
various error situations and see what the response contains.


I've written some test code.  When doing an add with SUSS against a 
server that's down, no exception is thrown.  It does throw one for query 
and deleteByQuery.  When doing the add test with CHSS, an exception is 
thrown.  I guess I'll just have to use CHSS until this gets fixed, 
assuming it ever does.  Would it be at all helpful to file an issue in 
jira, or has one already been filed?  With a quick search, I could not 
find one.


Thanks,
Shawn



Re: Using the ids parameter

2012-03-26 Thread Dmitry Kan
Hi,

Actually we ran into the same issue with using ids parameter, in the solr
front with shards architecture (exception throws in the solr front). Were
you able to solve it by using the key:value syntax or some other way?

BTW, there was a related issue:
https://issues.apache.org/jira/browse/SOLR-1477
but it's marked as Won't Fix, does anyone know why it is so, or if this is
planned to be resolved?

Dmitry

On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson  wrote:

> We're running into an issue where we are trying to use the ids=
> parameter to return a set of documents given their id.  This seems to
> work intermittently when running in SolrCloud.  The first question I
> have is this something that we should be using or instead should we
> doing a query with key:?  The stack trace that I am getting right now
> is included below, any thoughts would be appreciated.
>
> Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
> INFO: [slice1_shard1] webapp=/solr path=/select
>
> params={hl.fragsize=1&ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
> status=500 QTime=32
> Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.NullPointerException
>  at
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
>  at
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
>  at
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
>  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
>  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
>  at
> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
>  at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
>  at
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>  at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
>  at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
>  at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
>  at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>  at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>  at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>  at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>  at org.mortbay.jetty.Server.handle(Server.java:326)
>  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>  at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>  at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>  at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>


Re: Re: Indexing Source Code

2012-03-26 Thread Arbeit . bastian
I can't find better examples at the moment... I hope they are sufficient to  
describe what I need.


''' Code
25 01 RETURN-CODES.
26 05 RTC00 PIC X(2) VALUE '00'.
27 05 RTC01 PIC X(2) VALUE '01'.
28 05 RTC04 PIC X(2) VALUE '04'.
29 05 RTC08 PIC X(2) VALUE '08'.
''' /Code

This is a variable (field/array) in Cobol. I use it in some
(sub)programs. No I need to Reengener it because I need more return
Codes or I want to get rid of the Return Codes because I have another
solution. I want to find all appearance of this variable.

- List of variables used by a program.


''' Code
005915 CALL 'Unit01221' RETURN-CODE
''' /Code

Unit00221 is the subprogram I call.
The Main Program is Unit00200 - the last 2 Numbers > 0 says it is a
subprogram.

- I want to get a list of the subprograms the mainporgram uses.
- I give the name of a Subprogram, and get a list where it is used (can be  
in more then one main program).


- When possible I give the name of the Main and get a (nested) list of used  
subprograms, subsubprograms...


Thanks,
Bastian

Am schrieb Marcelo Carvalho Fernandes :

Hi Bastian,






Can you please tell us what kind of search you imagine doing with some  
(use




case) examples?







Marcelo







On Monday, March 26, 2012, Bastian H arbeit.bast...@googlemail.com> wrote:




> Hi,




>




> I like to index my Source Code - the most is Cobol, Asembler and Java -




> with Solr.




>




> I don't know where to start... I think I need to parse it to get XML for




> Solr. Do I need Tinka? Is there any Parser I could use?




>




> I want to index functions, variables and function calls as well as




> commentaries.




>




> Can somebody show me to a starting point?




>




> Thanks




> Bastian




>







--









Marcelo Carvalho Fernandes




+55 21 8272-7970




+55 21 2205-2786






how to store file path in Solr when using TikaEntityProcessor

2012-03-26 Thread ZHANG Liang F
Hi,

I am using DIH to index local file system. But the file path, size and 
lastmodified field were not stored. in the schema.xml I defined:

 
   
   
   
   
   
   
 


And also defined tika-data-config.xml:


















The Solr version is 3.5. any idea?

Thanks in advance.


Re: Practical Optimization

2012-03-26 Thread William Bell
What type of logging were you using?

Did you try log back? We get a pretty large increase when using that.

On Fri, Mar 23, 2012 at 2:57 PM, dw5ight  wrote:
> Hey All-
>
> we run a  http://carsabi.com car search engine  with Solr and did some
> benchmarking recently after we switched from a hosted service to
> self-hosting. In brief, we went from 800ms complex range queries on a 1.5M
> document corpus to 43ms. The major shifts were switching from EC2 Large to
> EC2 CC8XL which got us down to 282ms (2.82x speed gain due to 2.75x CPU
> speed increase we think), and then down to 43ms when we sharded to 8 cores.
> We tried sharding to 12 and 16 but saw negligible gains after this point.
>
> Anyway, hope this might be useful to someone - we write up exact stats and a
> step by step sharding procedure on our
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
> tech blog  if anyone's interested.
>
> best
> Dwight
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Practical-Optimization-tp3852776p3852776.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-26 Thread Shawn Heisey

On 3/26/2012 6:43 PM, Mark Miller wrote:

It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.


- Mark Miller
lucidimagination.com

On Mar 26, 2012, at 7:33 PM, Shawn Heisey wrote:


I've been building a new version of my app that keeps our Solr indexes up to 
date.  I had hoped to use StreamingUpdateSolrServer instead of 
CommonsHttpSolrServer for performance reasons, but I have run into a 
showstopper problem that has made me revert to CHSS.

I have been relying on exception handling to detect when there is any kind of 
problem with any request sent to Solr.  Looking at the code for SUSS, it seems 
that any exceptions thrown by lower level code are simply logged, then 
forgotten as if they had never happened.


The problem is that I currently have no way (that I know of so far) to 
detect that a problem happened.  As far as my code is concerned, 
everything worked, so it updates my position tracking and those 
documents will never be inserted.  I have not yet delved into the 
response object to see whether it can tell me anything.  My code 
currently assumes that if no exception was thrown, it was successful.  
This works with CHSS.  I will write some test code that tries out 
various error situations and see what the response contains.


Thanks,
Shawn



Re: Solr cores issue

2012-03-26 Thread Sujatha Arun
yes ,I must have mis-copied and yes, i do have the conf folder per core
with schema etc ...

Because of this issue ,we have decided to have multiple webapps with about
50 cores per webapp  ,instead of one singe webapp with all 200 cores ,would
this make better sense ?

what would be your suggestion?

Regards
Sujatha

On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson wrote:

> Shouldn't be. What do your log files say? You have to treat each
> core as a separate index. In other words, you need to have a core#/conf
> with the schema matching your core#/data/index directory etc.
>
> I suspect you've simply mis-copied something.
>
> Best
> Erick
>
> On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun  wrote:
> > I was migrating to cores from webapp ,and I was copying a bunch of
> indexes
> > from webapps to respective cores ,when I restarted ,I had this issue
> where
> > the whole webapp with the cores would not startup and was getting index
> > corrupted message..
> >
> > In this scenario or in a scenario where there is an issue with schema
> > /config file for one core ,will the whole webapp with the cores not
> restart?
> >
> > Regards
> > Sujatha
> >
> > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson  >wrote:
> >
> >> Index corruption is very rare, can you provide more details how you
> >> got into that state?
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun 
> wrote:
> >> > Hello,
> >> >
> >> > Suppose  I have several cores in a single webapp ,I have issue with
> Index
> >> > beong corrupted in one core  ,or schema /solrconfig of one core is not
> >> well
> >> > formed ,then entire webapp refused to load on server restart?
> >> >
> >> > Why does this happen?
> >> >
> >> > Regards
> >> > Sujatha
> >>
>


Why my highlights are wrong(one character offset)?

2012-03-26 Thread neosky
all of my highlights has one character mistake in the offset,some fragments
from my response. Thanks!




0
259


on
sequence

true
10
2.2
*,score
true
0
sequence:NGNFN







TSQSELSNGNFNRRPKIELSNFDGNHPKTWIRKC




GENTRERNGNFNSLTRERSFAELENHPPKVRRNGSEG




EGRYPCNNGNFNLTTGRCVCEKNYVHLIYEDRI




YAEENYINGNFNEEPY




KEVADDCNGNFNQPTGVRI






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3860286.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why my highlights are wrong(one character offset)?

2012-03-26 Thread neosky
all of my highlights has one character mistake in the offset,some fragments
from my response. Thanks!



0
259


on
sequence

true
10
2.2
*,score
true
0
sequence:NGNFN







TSQSELSNGNFNRRPKIELSNFDGNHPKTWIRKC




GENTRERNGNFNSLTRERSFAELENHPPKVRRNGSEG




EGRYPCNNGNFNLTTGRCVCEKNYVHLIYEDRI




YAEENYINGNFNEEPY




KEVADDCNGNFNQPTGVRI





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud multiple shards and replicas in one instance freezes

2012-03-26 Thread Jamie Johnson
For whatever reason. I'm having difficult ulty reproducing the issue, I'll
continue to try and reproduce

On Sunday, March 25, 2012, Mark Miller  wrote:
> Yeah, sorry - that's what I meant.
>
> Sent from my iPad
>
> On Mar 24, 2012, at 2:18 PM, Jamie Johnson  wrote:
>
>> There is no stack trace, I can fire things back up and try to get a
>> thread dump if that's useful.
>>
>> On Sat, Mar 24, 2012 at 4:07 AM, Mark Miller 
wrote:
>>> Can you get a stack trace dump?
>>>
>>> Sent from my iPhone
>>>
>>> On Mar 23, 2012, at 10:38 PM, Jamie Johnson  wrote:
>>>
 I run a test setup on my mac where I setup 4 cores 2 of which are
 replicas in a single solr JVM instance.  I recently attempted to move
 this same setup to Ubuntu 10.04.4 LTS but the system seems to just
 lock up.  I am running a local test which essentially adds 100 docs
 and says commit after 10s, after doing this once the solr instance
 just becomes non responsive, what can I look at to try to diagnose
 why?  I've increased the number of open file descriptors for the user
 running solr to 200,000.  Any pointers of where to look would be
 great.
>


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-26 Thread Mark Miller
It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.


- Mark Miller
lucidimagination.com

On Mar 26, 2012, at 7:33 PM, Shawn Heisey wrote:

> I've been building a new version of my app that keeps our Solr indexes up to 
> date.  I had hoped to use StreamingUpdateSolrServer instead of 
> CommonsHttpSolrServer for performance reasons, but I have run into a 
> showstopper problem that has made me revert to CHSS.
> 
> I have been relying on exception handling to detect when there is any kind of 
> problem with any request sent to Solr.  Looking at the code for SUSS, it 
> seems that any exceptions thrown by lower level code are simply logged, then 
> forgotten as if they had never happened.
> 
> So far I have not been able to decipher how things actually work, so I can't 
> tell if it would be possible to propagate the exception back up into my code.
> 
> Questions for the experts: Would such propagation be possible without 
> compromising performance?  Is this a bug?  Can I somehow detect the failure 
> and throw an exception of my own?
> 
> For reference, here is the exception that gets logged, but not actually 
> thrown:
> 
> java.net.ConnectException: Connection refused
>at java.net.PlainSocketImpl.socketConnect(Native Method)
>at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>at java.net.Socket.connect(Socket.java:579)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
>at 
> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
>at 
> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
>at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>at 
> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:722)
> 
> Thanks,
> Shawn
> 














StreamingUpdateSolrServer - exceptions not propagated

2012-03-26 Thread Shawn Heisey
I've been building a new version of my app that keeps our Solr indexes 
up to date.  I had hoped to use StreamingUpdateSolrServer instead of 
CommonsHttpSolrServer for performance reasons, but I have run into a 
showstopper problem that has made me revert to CHSS.


I have been relying on exception handling to detect when there is any 
kind of problem with any request sent to Solr.  Looking at the code for 
SUSS, it seems that any exceptions thrown by lower level code are simply 
logged, then forgotten as if they had never happened.


So far I have not been able to decipher how things actually work, so I 
can't tell if it would be possible to propagate the exception back up 
into my code.


Questions for the experts: Would such propagation be possible without 
compromising performance?  Is this a bug?  Can I somehow detect the 
failure and throw an exception of my own?


For reference, here is the exception that gets logged, but not actually 
thrown:


java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
at 
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

Thanks,
Shawn



Re: Slave index size growing fast

2012-03-26 Thread Alexandre Rocco
Erick,

I haven't changed the maxCommitsToKeep yet.
We stopped the slave that had issues and removed the data dir as you
pointed and afer starting it, everything started working as normal.
I guess that at some point someone commited on the slave or even copied the
master files over and made this mess. Will check on the internal docs to
prevent this from happening again.

Thanks for explaining the whole concept, will be useful to understand the
whole process.

Best,
Alexandre

On Fri, Mar 23, 2012 at 4:05 PM, Erick Erickson wrote:

> Alexandre:
>
> Have you changed anything like  on your slave?
> And do you have more than one slave? If you do, have you considered
> just blowing away the entire .../data directory on the slave and letting
> it re-start from scratch? I'd take the slave out of service for the
> duration of this operation, or do it when you are OK with some number of
> requests going to an empty index
>
> Because having an index. directory indicates that sometime
> someone forced the slave to get out of sync, possibly as you say by
> doing a commit. Or sending docs to it to be indexed or some such. Starting
> the slave over should fix that if it's the root of your problem.
>
> Note a curious thing about the . When you start indexing, the
> index version is a timestamp. However, from that point on when the index
> changes, the version number is just incremented (not made the current
> time). This is to avoid problems with masters and slaves having different
> times. But a consequence of that is if your slave somehow gets an index
> that's newer, the replication process does the best it can to not delete
> indexes that are out of sync with the master and saves them away. This
> might be what you're seeing.
>
> I'm grasping at straws a bit here, but this seems possible.
>
> Best
> Erick
>
> On Fri, Mar 23, 2012 at 1:16 PM, Alexandre Rocco 
> wrote:
> > Tomás,
> >
> > The 300+GB size is only inside the index.20110926152410 dir. Inside there
> > are a lot of files.
> > I am almost conviced that something is messed up like someone commited on
> > this slave machine.
> >
> > Thanks
> >
> > 2012/3/23 Tomás Fernández Löbbe 
> >
> >> Alexandre, additionally to what Erick said, you may want to check in the
> >> slave if what's 300+GB is the "data" directory or the
> "index."
> >> directory.
> >>
> >> On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >wrote:
> >>
> >> > not really, unless perhaps you're issuing commits or optimizes
> >> > on the _slave_ (which you should NOT do).
> >> >
> >> > Replication happens based on the version of the index on the master.
> >> > True, it starts out as a timestamp, but then successive versions
> >> > just have that number incremented. The version number
> >> > in the index on the slave is compared against the one on the master,
> >> > but the actual time (on the slave or master) is irrelevant. This is
> >> > explicitly to avoid problems with time synching across
> >> > machines/timezones/whataver
> >> >
> >> > It would be instructive to look at the admin/info page to see what
> >> > the index version is on the master and slave.
> >> >
> >> > But, if you optimize or commit (I think) on the _slave_, you might
> >> > change the timestamp and mess things up (although I'm reaching
> >> > here, I don't know this for certain).
> >> >
> >> > What's the  index look like on the slave as compared to the master?
> >> > Are there just a bunch of files on the slave? Or a bunch of
> directories?
> >> >
> >> > Instead of re-indexing on the master, you could try to bring down the
> >> > slave, blow away the entire index and start it back up. Since this is
> a
> >> > production system, I'd only try this if I had more than one slave.
> >> Although
> >> > you could bring up a new slave and attach it to the master and see
> >> > what happens there. You wouldn't affect production if you didn't point
> >> > incoming requests at it...
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco 
> >> > wrote:
> >> > > Erick,
> >> > >
> >> > > We're using Solr 3.3 on Linux (CentOS 5.6).
> >> > > The /data dir on master is actually 1.2G.
> >> > >
> >> > > I haven't tried to recreate the index yet. Since it's a production
> >> > > environment,
> >> > > I guess that I can stop replication and indexing and then recreate
> the
> >> > > master index to see if it makes any difference.
> >> > >
> >> > > Also just noticed another thread here named "Simple Slave
> Replication
> >> > > Question" that tells that it could be a problem if I'm seeing an
> >> > > /data/index with an timestamp on the slave node.
> >> > > Is this info relevant to this issue?
> >> > >
> >> > > Thanks,
> >> > > Alexandre
> >> > >
> >> > > On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson <
> >> > erickerick...@gmail.com>wrote:
> >> > >
> >> > >> What version of Solr and what operating system?
> >> > >>
> >> > >> But regardless, this shouldn't be happening. 

Re: Old Google Guava library needs updating (r05)

2012-03-26 Thread Stanislaw Osinski
I've filed an issue for myself as a reminder. Guava r05 is pretty old
indeed, time to upgrade.

S.

On Mon, Mar 26, 2012 at 23:12, Nicholas Ball wrote:

>
> Hey Staszek,
>
> Thanks for the reply. Yep using 4.x and that was exactly what I ended up
> doing, a quick replace :)
> Just thought I'd document it somewhere for a proper fix to be done in the
> 4.0 release.
>
> No issues arose for me but then again Erick mentions it's only used in
> Carrot2 contrib which I'm not using in my deployment.
>
> Thanks for the help!
> Nick
>
> On Mon, 26 Mar 2012 22:40:14 +0200, Stanislaw Osinski
>  wrote:
> > Hi Nick,
> >
> > Which version of Solr do you have in mind? The official 3.x line or 4.0?
> >
> > The quick and dirty fix to try would be to just replace Guava r05 with
> the
> > latest version, chances are it will work (we did that in the past though
> > the version number difference was smaller).
> >
> > The proper fix would be for us to make a point release of Carrot2 with
> > dependencies updated and update Carrot2 in Solr. And this brings us to
> the
> > question about the version of Solr you use. Upgrading Carrot2 in 4.0
> > shouldn't be an issue, but when it comes to 3.x I'd need to check.
> >
> > Staszek
> >
> > On Mon, Mar 26, 2012 at 13:10, Erick Erickson
> > wrote:
> >
> >> Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so
> >> maybe
> >> ask over at: http://project.carrot2.org/?
> >>
> >> Best
> >> Erick
> >>
> >> On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball
> >>  wrote:
> >> >
> >> > Hey all,
> >> >
> >> > Working on a plugin, which uses the Curator library (ZooKeeper
> client).
> >> > Curator depends on the very latest Google Guava library which
> >> unfortunately
> >> > clashes with Solr's outdated r05 of Guava.
> >> > Think it's safe to say that Solr should be using the very latest
> Guava
> >> > library (11.0.1) too right?
> >> > Shall I open up a JIRA issue for someone to update it?
> >> >
> >> > Cheers,
> >> > Nick
> >>
>


Re: Old Google Guava library needs updating (r05)

2012-03-26 Thread Nicholas Ball

Hey Staszek,

Thanks for the reply. Yep using 4.x and that was exactly what I ended up
doing, a quick replace :)
Just thought I'd document it somewhere for a proper fix to be done in the
4.0 release.

No issues arose for me but then again Erick mentions it's only used in
Carrot2 contrib which I'm not using in my deployment.

Thanks for the help!
Nick

On Mon, 26 Mar 2012 22:40:14 +0200, Stanislaw Osinski
 wrote:
> Hi Nick,
> 
> Which version of Solr do you have in mind? The official 3.x line or 4.0?
> 
> The quick and dirty fix to try would be to just replace Guava r05 with
the
> latest version, chances are it will work (we did that in the past though
> the version number difference was smaller).
> 
> The proper fix would be for us to make a point release of Carrot2 with
> dependencies updated and update Carrot2 in Solr. And this brings us to
the
> question about the version of Solr you use. Upgrading Carrot2 in 4.0
> shouldn't be an issue, but when it comes to 3.x I'd need to check.
> 
> Staszek
> 
> On Mon, Mar 26, 2012 at 13:10, Erick Erickson
> wrote:
> 
>> Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so
>> maybe
>> ask over at: http://project.carrot2.org/?
>>
>> Best
>> Erick
>>
>> On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball
>>  wrote:
>> >
>> > Hey all,
>> >
>> > Working on a plugin, which uses the Curator library (ZooKeeper
client).
>> > Curator depends on the very latest Google Guava library which
>> unfortunately
>> > clashes with Solr's outdated r05 of Guava.
>> > Think it's safe to say that Solr should be using the very latest
Guava
>> > library (11.0.1) too right?
>> > Shall I open up a JIRA issue for someone to update it?
>> >
>> > Cheers,
>> > Nick
>>


Re: Old Google Guava library needs updating (r05)

2012-03-26 Thread Stanislaw Osinski
Hi Nick,

Which version of Solr do you have in mind? The official 3.x line or 4.0?

The quick and dirty fix to try would be to just replace Guava r05 with the
latest version, chances are it will work (we did that in the past though
the version number difference was smaller).

The proper fix would be for us to make a point release of Carrot2 with
dependencies updated and update Carrot2 in Solr. And this brings us to the
question about the version of Solr you use. Upgrading Carrot2 in 4.0
shouldn't be an issue, but when it comes to 3.x I'd need to check.

Staszek

On Mon, Mar 26, 2012 at 13:10, Erick Erickson wrote:

> Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so
> maybe
> ask over at: http://project.carrot2.org/?
>
> Best
> Erick
>
> On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball
>  wrote:
> >
> > Hey all,
> >
> > Working on a plugin, which uses the Curator library (ZooKeeper client).
> > Curator depends on the very latest Google Guava library which
> unfortunately
> > clashes with Solr's outdated r05 of Guava.
> > Think it's safe to say that Solr should be using the very latest Guava
> > library (11.0.1) too right?
> > Shall I open up a JIRA issue for someone to update it?
> >
> > Cheers,
> > Nick
>


Re: Index a set of file as one document in SOLR

2012-03-26 Thread Erick Erickson
Consider writing a SolrJ program that extracts the data from the
PDF file and combines it with the XML data. Here's an example
to get you started, it shows how to do the PDF extraction at least.
The other part of the code is a database connection, ignore that part.

You'll have to read in the XML, parse it, extract the relevant bits
and add them to the SolrInputDocument (see the example)

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Mon, Mar 26, 2012 at 9:25 AM, Anupam Bhattacharya
 wrote:
> I have a set/group of documents of XML and PDF type.
>
> Each XML document contains the bibliographic information and has a
> reference to the supporting PDF document.
> How can i index this Parent-Child doc types in SOLR schema as one doc. The
> PDF should be full text indexed for searching & only the corresponding
> Parent XML details should be shown if the PDF contains the searched
> keyword.
>
> How to design this kind of functionality in SOLR ?
>
> Appreciate any help on this.
>
> Regards
> Anupam


document inside document?

2012-03-26 Thread sam ”
Hey,

I am making an image search engine where people can tag images with various
items that are themselves tagged.
For example, http://example.com/abc.jpg is tagged with the following three
items:
- item1 that is tagged with: tall blond woman
- item2 that is tagged with: yellow purse
- item3 that is tagged with: gucci red dress

Querying for +yellow +purse  will return the example image. But, querying
for +gucci +purse will not because the image does not have an item tagged
with both gucci and purse.

In addition to "items", each image has various metadata such as alt text,
location, description, photo credit.. etc  that should be available for
search.

How should I write my schema.xml ?
If imageUrl is primary key, do I implement my own fieldType for items, so
that I can write:

What would myItemType look like so that solr would know the example image
will not be part of the query, +gucci +purse??

If itemId is primary key, I can use result grouping (
http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt
text and other image metadata for each item.

Or, should I create different schema for item search and metadata search?

Thanks.
Sam.


Re: Solr cores issue

2012-03-26 Thread Erick Erickson
Shouldn't be. What do your log files say? You have to treat each
core as a separate index. In other words, you need to have a core#/conf
with the schema matching your core#/data/index directory etc.

I suspect you've simply mis-copied something.

Best
Erick

On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun  wrote:
> I was migrating to cores from webapp ,and I was copying a bunch of indexes
> from webapps to respective cores ,when I restarted ,I had this issue where
> the whole webapp with the cores would not startup and was getting index
> corrupted message..
>
> In this scenario or in a scenario where there is an issue with schema
> /config file for one core ,will the whole webapp with the cores not restart?
>
> Regards
> Sujatha
>
> On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson 
> wrote:
>
>> Index corruption is very rare, can you provide more details how you
>> got into that state?
>>
>> Best
>> Erick
>>
>> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun  wrote:
>> > Hello,
>> >
>> > Suppose  I have several cores in a single webapp ,I have issue with Index
>> > beong corrupted in one core  ,or schema /solrconfig of one core is not
>> well
>> > formed ,then entire webapp refused to load on server restart?
>> >
>> > Why does this happen?
>> >
>> > Regards
>> > Sujatha
>>


Re: First steps with Solr

2012-03-26 Thread henri.gour...@laposte.net
trying to play with javascript to clean-up my URL!!
Context is velocity



Suggestions?
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Client-side failover with SolrJ

2012-03-26 Thread Jamie Johnson
Did you try 
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/LBHttpSolrServer.html?
 This might be what you're looking for.

On Mon, Mar 26, 2012 at 11:23 AM,   wrote:
> Hi,
>
> has SolrJ any possiblities to do a failover from a master to a slave for
> searching?
>
> Thank you
>


Re: First steps with Solr

2012-03-26 Thread henri.gour...@laposte.net
Partially solved problem!

I am playing with the doc.vm file in the velocity folder.

I have replaced 


where access is the value or the URL I want.

problem is that someone seems to insert spaces (%20) between *
Chausey? and #field('access') resulting into an invalid query. Everthing
else seems OK.
Is there a way to control this spaces insertion, or to remove them client
side.

Thanks,

Henri


--
View this message in context: 
http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858607.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: First steps with Solr

2012-03-26 Thread henri.gour...@laposte.net
Partially solved problem!

I am playing with the doc.vm file in the velocity folder.

I have replaced 
*#field('name')*
by
* http://127.0.0.1:2317/Chausey?#field('access') #field('name') *

where access is the value or the URL I want.

problem is that someone seems to insert spaces (%20) between *
Chausey? and #field('access') resulting into an invalid query. Everthing
else seems OK.
Is there a way to control this spaces insertion, or to remove them client
side.

Thanks,

Henri


--
View this message in context: 
http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858582.html
Sent from the Solr - User mailing list archive at Nabble.com.


QueryHandler

2012-03-26 Thread Peyman Faratin
Hi

A noobie question. I am uncertain what is the best way to design for my 
requirement which the following.

I want to allow another client in solrj to query solr with a query that is 
handled with a custom handler

localhost:9090/solr/tokenSearch?tokens{!dismax 
qf=content}pear,apples,oyster,king kong&fl=score&rows=1000

i.e. a list of tokens (single word and phrases) is sent in one http call. 

What I would like to do is to search over each individual token and compose a 
single response back to the client

The current approach I have taken is to create a custom search handler as 
follows


 
   dismax 
  
  
   myHandler
   



   
myHandler (which extends SearchComponent) overrides prepare and process 
methods, extracting and iterating over each token in the input. The problem I 
am hitting in this design is that the prepare() method is passed a reference to 
the SolrIndexSearcher in the ResponseBuilder parameter (so for efficiency 
reasons i don't want to open up another server connection for the search). I 
can construct a Lucene query and search just fine, but what i would like to do 
is instead use the e/dismax queries (rather than construct my own - to reduce 
errors). The getDocList() method of SolrIndexSearcher on the other hand 
requires a lucene query object. 

Is this an appropriate design for my requirement? And if so what is the best 
way to send a SolrQuery to the SolrIndexSearcher?

Thank you 

Peyman

Client-side failover with SolrJ

2012-03-26 Thread spring
Hi,

has SolrJ any possiblities to do a failover from a master to a slave for
searching?

Thank you



First steps with Solr

2012-03-26 Thread henri.gour...@laposte.net
Hi, I have been exploring Solr through the example provided.

I have created my own set of documents, and can start to index and query
using the Solritas GUI.

Two questions:
1/ I would like to have the "name" field to contain a URL to another server
on my machine.
When I put "  text " inside the name field, Solr complains at indexing
time.
What is the easy solution

2/ Where are the various aspects of the GUI documented or parametrised? I
believe playing with an existing/running program is one nice way to
discover.

Thanks for any help.

Henri

--
View this message in context: 
http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: web gui specify shard and collection when adding a new core

2012-03-26 Thread Jamie Johnson
https://issues.apache.org/jira/browse/SOLR-3275 is the ticket I
created.  If it's not clear enough please let me know I can try to
elaborate.

On Mon, Mar 26, 2012 at 3:46 AM, Stefan Matheis
 wrote:
> Jamie: SOLR-3238 is the current admin-ticket. create a new one for the 
> cloud-related options and describe what (and where) you'd like to have :)
>
>
>
> On Sunday, March 25, 2012 at 4:25 AM, Jamie Johnson wrote:
>
>> Is there a plan to add the ability to specify the shard and
>> collection when adding a core through the enhanced web gui, is there a
>> JIRA for this? If not I'd be more than happy to add the request if
>> someone can point me to the active JIRA (both 3162 and 2667 are marked
>> Resolved).
>
>
>


Using DateMath in Facet Label

2012-03-26 Thread Charlie Maroto
Hi,



We have a requirement to facet on a field with a date value so that the
following buckets are shown:



a)  Last Week

b)  Last Month

c)   Last Year

d)  2012

e)  2011 or earlier



Of course, as 2013 rolls in, then the labels for the last two buckets
should change to “2013” and “2012 or earlier”.  Is there any way to have
Solr return the correct year based on the current date?  For example, I
thought of trying to do something like this for d) above:



…&facet.query={!key=[NOW-1YEAR/YEAR]}date_entered:[NOW-1YEAR/YEAR TO
NOW/YEAR]...


Thanks,
Carlos


RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
That's great information.

Thanks for all the help and guidance, its been invaluable.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 26 March 2012 12:21
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

It's the optimize step. Optimize essentially forces all the segments to be 
copied into a single new segment, which means that your entire index will be 
replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you 
can demonstrate a noticeable change, I'd just leave the optimize step off. In 
fact, trunk renames it to forceMerge or something just because it's so common 
for people to think "of course I want to optimize my index!" and get the 
unintended consequences you're seeing even thought the optimize doesn't 
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during 
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy  
wrote:
> Hello,
>
> Had to leave the office so didn't get a chance to reply.  Nothing in the 
> logs.  Just ran one through from the ingest tool.
>
> Same results full copy of the index.
>
> Is it something to do with:
>
> server.commit();
> server.optimize();
>
> I call this at the end of the ingestion.
>
> Would optimize then work across the whole index?
>
> Thanks
> Ben
>
> -Original Message-
> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
> Sent: 23 March 2012 15:10
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Slave Replication Question
>
> Also, what happens if, instead of adding the 40K docs you add just one and 
> commit?
>
> 2012/3/23 Tomás Fernández Löbbe 
>
>> Have you changed the mergeFactor or are you using 10 as in the
>> example solrconfig?
>>
>> What do you see in the slave's log during replication? Do you see any
>> line like "Skipping download for..."?
>>
>>
>> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy <
>> ben.mccar...@tradermedia.co.uk> wrote:
>>
>>> I just have a index directory.
>>>
>>> I push the documents through with a change to a field.  Im using
>>> SOLRJ to do this.  Im using the guide from the wiki to setup the
>>> replication.  When the feed of updates to the master finishes I call
>>> a commit again using SOLRJ.  I then have a poll period of 5 minutes
>>> from the slave.  When it kicks in I see a new version of the index
>>> and then it copys the full 5gb index.
>>>
>>> Thanks
>>> Ben
>>>
>>> -Original Message-
>>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
>>> Sent: 23 March 2012 14:29
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Simple Slave Replication Question
>>>
>>> Hi Ben, only new segments are replicated from master to slave. In a
>>> situation where all the segments are new, this will cause the index
>>> to be fully replicated, but this rarely happen with incremental
>>> updates. It can also happen if the slave Solr assumes it has an "invalid" 
>>> index.
>>> Are you committing or optimizing on the slaves? After replication,
>>> the index directory on the slaves is called "index" or "index."?
>>>
>>> Tomás
>>>
>>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy <
>>> ben.mccar...@tradermedia.co.uk> wrote:
>>>
>>> > So do you just simpy address this with big nic and network pipes.
>>> >
>>> > -Original Message-
>>> > From: Martin Koch [mailto:m...@issuu.com]
>>> > Sent: 23 March 2012 14:07
>>> > To: solr-user@lucene.apache.org
>>> > Subject: Re: Simple Slave Replication Question
>>> >
>>> > I guess this would depend on network bandwidth, but we move around
>>> > 150G/hour when hooking up a new slave to the master.
>>> >
>>> > /Martin
>>> >
>>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
>>> > ben.mccar...@tradermedia.co.uk> wrote:
>>> >
>>> > > Hello,
>>> > >
>>> > > Im looking at the replication from a master to a number of slaves.
>>> > > I have configured it and it appears to be working.  When
>>> > > updating 40K records on the master is it standard to always copy
>>> > > over the full index, currently 5gb in size.  If this is standard
>>> > > what do people do who have massive 200gb indexs, does it not
>>> > > take a while to bring the
>>> > slaves inline with the master?
>>> > >
>>> > > Thanks
>>> > > Ben
>>> > >
>>> > > 
>>> > >
>>> > >
>>> > > This e-mail is sent on behalf of Trader Media Group Limited,
>>> > > Registered
>>> > > Office: Auto Trader House, Cutbush Park Industrial Estate,
>>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
>>> > > England No.
>>> > 4768833).
>>> > > This email and any files transmitted with it are confidential
>>> > > and may be legally privileged, and intended solely for the use
>>> > > of the individual or entity to whom they are addressed. If you
>>> > > have received this email in error please notify the sender. This
>>> > > email message h

RE: "ant test" and contribs

2012-03-26 Thread Steven A Rowe
Check out solr/contrib/analysis-extras/build.xml

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Monday, March 26, 2012 2:14 AM
To: solr-user@lucene.apache.org
Subject: Re: "ant test" and contribs

Ah! It is more complex. There is code and a library jar in 
lucene/contrib/module and a solr module that uses it in solr/contrib/module. I 
had to copy the library from lucene/contrib/module/jar to 
lucene/contrib/module/jar or else the solr contrib part would not compile.

There does not seem to be any contrib that does this. There are lucene/contrib 
parts that export jars. But there is no solr/contrib that needs one of those 
jars, is there?

On Sat, Mar 24, 2012 at 5:05 PM, Steven A Rowe  wrote:
> Hi Lance,
>
> Are you adding a new solr/contrib/project/?  If so, why not use the build.xml 
> file from a sibling project?  E.g. try starting from 
> solr/contrib/velocity/build.xml - it is very simple and enables all build 
> steps by importing solr/contrib/contrib-build.xml.
>
> solr/contrib/contrib-build.xml imports solr/common-build.xml; 
> solr/common-build.xml imports lucene/contrib/contrib-build.xml; and 
> lucene/contrib/contrib-build.xml imports lucene/common-build.xml.
>
> Simple!
>
> Steve
>
> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com]
> Sent: Saturday, March 24, 2012 7:56 PM
> To: solr-user
> Subject: "ant test" and contribs
>
> What do I need to add so that a contrib/project/src/test/ directory can find 
> the classes in contrib/project/src/java? I've gotten the ant files to where 
> 'ant test-contrib' works. But 'ant test' fails: it cannot compile the test 
> classes after building the jars for contrib/project. Any hints?
>
> --
> Lance Norskog
> goks...@gmail.com



--
Lance Norskog
goks...@gmail.com


Re: Solr cores issue

2012-03-26 Thread Sujatha Arun
I was migrating to cores from webapp ,and I was copying a bunch of indexes
from webapps to respective cores ,when I restarted ,I had this issue where
the whole webapp with the cores would not startup and was getting index
corrupted message..

In this scenario or in a scenario where there is an issue with schema
/config file for one core ,will the whole webapp with the cores not restart?

Regards
Sujatha

On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson wrote:

> Index corruption is very rare, can you provide more details how you
> got into that state?
>
> Best
> Erick
>
> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun  wrote:
> > Hello,
> >
> > Suppose  I have several cores in a single webapp ,I have issue with Index
> > beong corrupted in one core  ,or schema /solrconfig of one core is not
> well
> > formed ,then entire webapp refused to load on server restart?
> >
> > Why does this happen?
> >
> > Regards
> > Sujatha
>


Re: Querying field with parenthesis

2012-03-26 Thread Erick Erickson
Your problem is the KeywordTokenizerFactory and the query parser.
This often trips people up. When you use author:(stephen king), the
query parser breaks this up before it gets to the analysis chain
into two separate tokens. But by virtue of the
fact that you're using KeywordTokenizer, the actual field only
has a single token "stephen king". So neither of the
pieces match. When you put "stephen king" (with quotes)
in, the query parser does not try to break the tokens up and
the analysis chain gets a single token rather than two.

Your wildcards are matching because steph* and *king
both match the _single_ token "stephen king".

Two ways you can get lots of help with this kind of
think is the admin/analysis page and attaching
&debugQuery=on to your URL and look at the
parsed query results.

Using something like WhitespaceTokenizerFactory might
give you more expected results.

Best
Erick

2012/3/26 Tim Terlegård :
> I have created my own field type. I have indexed "Stephen King" and
> get no hit when searching
> author:(stephen king)
>
> I get a hit when searching like this
> author:(stephen* AND *king)
>
> I also get a hit when searching like this
> author:"stephen king"
>
> So it seems like when querying with (...) it actually splits the
> words. This is the type of the author field
>
>    
>      
>        
>        
>      
>    
>
> I expected that author:(stephen king) would do the same thing as
> author:"stephen king". Why is this not the case?
>
> Thanks,
> Tim


Re: Simple Slave Replication Question

2012-03-26 Thread Erick Erickson
It's the optimize step. Optimize essentially forces all the segments to
be copied into a single new segment, which means that your entire index
will be replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you
can demonstrate a noticeable change, I'd just leave the optimize step off. In
fact, trunk renames it to forceMerge or something just because it's so common
for people to think "of course I want to optimize my index!" and get the
unintended consequences you're seeing even thought the optimize doesn't
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy
 wrote:
> Hello,
>
> Had to leave the office so didn't get a chance to reply.  Nothing in the 
> logs.  Just ran one through from the ingest tool.
>
> Same results full copy of the index.
>
> Is it something to do with:
>
> server.commit();
> server.optimize();
>
> I call this at the end of the ingestion.
>
> Would optimize then work across the whole index?
>
> Thanks
> Ben
>
> -Original Message-
> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
> Sent: 23 March 2012 15:10
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Slave Replication Question
>
> Also, what happens if, instead of adding the 40K docs you add just one and 
> commit?
>
> 2012/3/23 Tomás Fernández Löbbe 
>
>> Have you changed the mergeFactor or are you using 10 as in the example
>> solrconfig?
>>
>> What do you see in the slave's log during replication? Do you see any
>> line like "Skipping download for..."?
>>
>>
>> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy <
>> ben.mccar...@tradermedia.co.uk> wrote:
>>
>>> I just have a index directory.
>>>
>>> I push the documents through with a change to a field.  Im using
>>> SOLRJ to do this.  Im using the guide from the wiki to setup the
>>> replication.  When the feed of updates to the master finishes I call
>>> a commit again using SOLRJ.  I then have a poll period of 5 minutes
>>> from the slave.  When it kicks in I see a new version of the index
>>> and then it copys the full 5gb index.
>>>
>>> Thanks
>>> Ben
>>>
>>> -Original Message-
>>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
>>> Sent: 23 March 2012 14:29
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Simple Slave Replication Question
>>>
>>> Hi Ben, only new segments are replicated from master to slave. In a
>>> situation where all the segments are new, this will cause the index
>>> to be fully replicated, but this rarely happen with incremental
>>> updates. It can also happen if the slave Solr assumes it has an "invalid" 
>>> index.
>>> Are you committing or optimizing on the slaves? After replication,
>>> the index directory on the slaves is called "index" or "index."?
>>>
>>> Tomás
>>>
>>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy <
>>> ben.mccar...@tradermedia.co.uk> wrote:
>>>
>>> > So do you just simpy address this with big nic and network pipes.
>>> >
>>> > -Original Message-
>>> > From: Martin Koch [mailto:m...@issuu.com]
>>> > Sent: 23 March 2012 14:07
>>> > To: solr-user@lucene.apache.org
>>> > Subject: Re: Simple Slave Replication Question
>>> >
>>> > I guess this would depend on network bandwidth, but we move around
>>> > 150G/hour when hooking up a new slave to the master.
>>> >
>>> > /Martin
>>> >
>>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
>>> > ben.mccar...@tradermedia.co.uk> wrote:
>>> >
>>> > > Hello,
>>> > >
>>> > > Im looking at the replication from a master to a number of slaves.
>>> > > I have configured it and it appears to be working.  When updating
>>> > > 40K records on the master is it standard to always copy over the
>>> > > full index, currently 5gb in size.  If this is standard what do
>>> > > people do who have massive 200gb indexs, does it not take a while
>>> > > to bring the
>>> > slaves inline with the master?
>>> > >
>>> > > Thanks
>>> > > Ben
>>> > >
>>> > > 
>>> > >
>>> > >
>>> > > This e-mail is sent on behalf of Trader Media Group Limited,
>>> > > Registered
>>> > > Office: Auto Trader House, Cutbush Park Industrial Estate,
>>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
>>> > > England No.
>>> > 4768833).
>>> > > This email and any files transmitted with it are confidential and
>>> > > may be legally privileged, and intended solely for the use of the
>>> > > individual or entity to whom they are addressed. If you have
>>> > > received this email in error please notify the sender. This email
>>> > > message has been swept for the presence of computer viruses.
>>> > >
>>> > >
>>> >
>>> > 
>>> >
>>> >
>>> > This e-mail is sent on behalf of Trader Media Group Limited,
>>> > Registered
>>> > Office: Auto Trader House, Cutbush Park Industrial Estate,
>>> > Danehil

Re: Solr cores issue

2012-03-26 Thread Erick Erickson
Index corruption is very rare, can you provide more details how you
got into that state?

Best
Erick

On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun  wrote:
> Hello,
>
> Suppose  I have several cores in a single webapp ,I have issue with Index
> beong corrupted in one core  ,or schema /solrconfig of one core is not well
> formed ,then entire webapp refused to load on server restart?
>
> Why does this happen?
>
> Regards
> Sujatha


Re: Trouble Setting Up Development Environment

2012-03-26 Thread Erick Erickson
Depending upon what you actually need to do, you could consider just
attaching to the running Solr instance remotely. I know it's easy in
IntelliJ, and believe Eclipse makes this easy as well but I haven't
used Eclipse in a while

Best
Erick

On Sat, Mar 24, 2012 at 11:11 PM, Li Li  wrote:
> I forgot to write that I am running it in tomcat 6, not jetty.
> you can right click the project -> Debug As -> Debug on Server -> Manually
> define a new Server -> Apache -> Tomcat 6
> if you should have configured a tomcat.
>
> On Sun, Mar 25, 2012 at 4:17 AM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
>
>> I followed your instructions. I got 8 Errors and a bunch of warnings few
>> of them related to classpath. I also got the following exception when I
>> tried to run with the jetty ( i have attached the full console output with
>> this email. I figured solr directory with config files might be missing and
>> added that in WebContent.
>>
>> Could be of great help if someone can point me at right direction.
>>
>> ls WebContent
>>                 admin  favicon.ico  index.jsp  solr  WEB-INF
>>
>>
>> *SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No
>> system property or default value specified for solr.test.sys.prop1*
>>     at
>> org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:331)
>>     at
>> org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:290)
>>     at
>> org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:292)
>>     at org.apache.solr.core.Config.(Config.java:165)
>>     at org.apache.solr.core.SolrConfig.(SolrConfig.java:131)
>>     at org.apache.solr.core.CoreContainer.create(CoreContainer.java:435)
>>     at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
>>     at
>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:133)
>>     at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
>>     at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
>>     at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>>     at
>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
>>     at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>>     at
>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
>>     at
>> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
>>     at
>> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
>>     at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>>     at
>> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>>     at org.mortbay.jetty.Server.doStart(Server.java:224)
>>     at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>>     at runjettyrun.Bootstrap.main(Bootstrap.java:97)
>>
>>
>> *Here are the 8 errors I got*
>> *Description
>>                                                              Resource
>>                                            Path
>>
>>           Location       Type*
>> core cannot be resolved
>> dataimport.jsp
>>  /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin
>>                                  line 27   JSP Problem
>> End tag () not closed properly, expected >.    package.html
>> /solr3_5/ssrc/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/core/config
>>  line 64    HTML Problem
>> Fragment    "_info.jsp" was not found at expected
>> path  /solr3_5/ssrc/solr/contrib/
>> dataimporthandler/src/webapp/admin/_info.jsp            dataimport.jsp
>> /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin
>>                                             line 21    JSP Problem
>> Fragment "_info.jsp" was not found at expected
>> path /solr3_5/ssrc/solr/contrib/dataimporthandler
>> /src/webapp/admin/_info.jsp
>> debug.jsp
>> /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin
>>                                             line 19    JSP Problem
>> Named template dotdots is not available                     tabutils.xsl
>>
>> /solr3_5/ssrc/lucene/src/site/src/documentation/skins/common/xslt/html
>>                               line 41    XSL Problem
>> Named template dotdots is not available                     tabutils.xsl
>>     /solr3_5/ssrc/solr/site-src/src/documentation/skins/common/xslt/html
>>                                   line 41    XSL Problem
>> Unhandled exception type Throwable                          ping.jsp
>>     /solr3_5/WebContent/admin
>>                                                   line 46    JSP Problem
>> Unhandled exception type Throwable                          ping.jsp
>>         /solr3_5/ssrc/solr/webapp/web/admin
>>
>> line 46    JSP Problem
>>
>>
>> *Here are the warnings I got*
>>
>>> Description    Resource    Path    Location    Type
>>> Classpath entry
>>> /solr3_5/

Querying field with parenthesis

2012-03-26 Thread Tim Terlegård
I have created my own field type. I have indexed "Stephen King" and
get no hit when searching
author:(stephen king)

I get a hit when searching like this
author:(stephen* AND *king)

I also get a hit when searching like this
author:"stephen king"

So it seems like when querying with (...) it actually splits the
words. This is the type of the author field


  


  


I expected that author:(stephen king) would do the same thing as
author:"stephen king". Why is this not the case?

Thanks,
Tim


Re: Old Google Guava library needs updating (r05)

2012-03-26 Thread Erick Erickson
Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so maybe
ask over at: http://project.carrot2.org/?

Best
Erick

On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball
 wrote:
>
> Hey all,
>
> Working on a plugin, which uses the Curator library (ZooKeeper client).
> Curator depends on the very latest Google Guava library which unfortunately
> clashes with Solr's outdated r05 of Guava.
> Think it's safe to say that Solr should be using the very latest Guava
> library (11.0.1) too right?
> Shall I open up a JIRA issue for someone to update it?
>
> Cheers,
> Nick


Re: Indexing Source Code

2012-03-26 Thread Marcelo Carvalho Fernandes
Hi Bastian,

Can you please tell us what kind of search you imagine doing with some (use
case) examples?

Marcelo

On Monday, March 26, 2012, Bastian H  wrote:
> Hi,
>
> I like to index my Source Code - the most is Cobol, Asembler and Java -
> with Solr.
>
> I don't know where to start... I think I need to parse it to get XML for
> Solr. Do I need Tinka? Is there any Parser I could use?
>
> I want to index functions, variables and function calls as well as
> commentaries.
>
> Can somebody show me to a starting point?
>
> Thanks
> Bastian
>

-- 

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


Re: Practical Optimization

2012-03-26 Thread Erick Erickson
Thanks, this is very helpful!

On Sat, Mar 24, 2012 at 4:50 AM, Martin Koch  wrote:
> Thanks for writing this up. These are good tips.
>
> /Martin
>
> On Fri, Mar 23, 2012 at 9:57 PM, dw5ight  wrote:
>
>> Hey All-
>>
>> we run a  http://carsabi.com car search engine  with Solr and did some
>> benchmarking recently after we switched from a hosted service to
>> self-hosting. In brief, we went from 800ms complex range queries on a 1.5M
>> document corpus to 43ms. The major shifts were switching from EC2 Large to
>> EC2 CC8XL which got us down to 282ms (2.82x speed gain due to 2.75x CPU
>> speed increase we think), and then down to 43ms when we sharded to 8 cores.
>> We tried sharding to 12 and 16 but saw negligible gains after this point.
>>
>> Anyway, hope this might be useful to someone - we write up exact stats and
>> a
>> step by step sharding procedure on our
>>
>> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
>> tech blog  if anyone's interested.
>>
>> best
>> Dwight
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Practical-Optimization-tp3852776p3852776.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Indexing Source Code

2012-03-26 Thread Bastian H
Hi,

I like to index my Source Code - the most is Cobol, Asembler and Java -
with Solr.

I don't know where to start... I think I need to parse it to get XML for
Solr. Do I need Tinka? Is there any Parser I could use?

I want to index functions, variables and function calls as well as
commentaries.

Can somebody show me to a starting point?

Thanks
Bastian


RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
Hello,

Had to leave the office so didn't get a chance to reply.  Nothing in the logs.  
Just ran one through from the ingest tool.

Same results full copy of the index.

Is it something to do with:

server.commit();
server.optimize();

I call this at the end of the ingestion.

Would optimize then work across the whole index?

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 15:10
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Also, what happens if, instead of adding the 40K docs you add just one and 
commit?

2012/3/23 Tomás Fernández Löbbe 

> Have you changed the mergeFactor or are you using 10 as in the example
> solrconfig?
>
> What do you see in the slave's log during replication? Do you see any
> line like "Skipping download for..."?
>
>
> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy <
> ben.mccar...@tradermedia.co.uk> wrote:
>
>> I just have a index directory.
>>
>> I push the documents through with a change to a field.  Im using
>> SOLRJ to do this.  Im using the guide from the wiki to setup the
>> replication.  When the feed of updates to the master finishes I call
>> a commit again using SOLRJ.  I then have a poll period of 5 minutes
>> from the slave.  When it kicks in I see a new version of the index
>> and then it copys the full 5gb index.
>>
>> Thanks
>> Ben
>>
>> -Original Message-
>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
>> Sent: 23 March 2012 14:29
>> To: solr-user@lucene.apache.org
>> Subject: Re: Simple Slave Replication Question
>>
>> Hi Ben, only new segments are replicated from master to slave. In a
>> situation where all the segments are new, this will cause the index
>> to be fully replicated, but this rarely happen with incremental
>> updates. It can also happen if the slave Solr assumes it has an "invalid" 
>> index.
>> Are you committing or optimizing on the slaves? After replication,
>> the index directory on the slaves is called "index" or "index."?
>>
>> Tomás
>>
>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy <
>> ben.mccar...@tradermedia.co.uk> wrote:
>>
>> > So do you just simpy address this with big nic and network pipes.
>> >
>> > -Original Message-
>> > From: Martin Koch [mailto:m...@issuu.com]
>> > Sent: 23 March 2012 14:07
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Simple Slave Replication Question
>> >
>> > I guess this would depend on network bandwidth, but we move around
>> > 150G/hour when hooking up a new slave to the master.
>> >
>> > /Martin
>> >
>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
>> > ben.mccar...@tradermedia.co.uk> wrote:
>> >
>> > > Hello,
>> > >
>> > > Im looking at the replication from a master to a number of slaves.
>> > > I have configured it and it appears to be working.  When updating
>> > > 40K records on the master is it standard to always copy over the
>> > > full index, currently 5gb in size.  If this is standard what do
>> > > people do who have massive 200gb indexs, does it not take a while
>> > > to bring the
>> > slaves inline with the master?
>> > >
>> > > Thanks
>> > > Ben
>> > >
>> > > 
>> > >
>> > >
>> > > This e-mail is sent on behalf of Trader Media Group Limited,
>> > > Registered
>> > > Office: Auto Trader House, Cutbush Park Industrial Estate,
>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
>> > > England No.
>> > 4768833).
>> > > This email and any files transmitted with it are confidential and
>> > > may be legally privileged, and intended solely for the use of the
>> > > individual or entity to whom they are addressed. If you have
>> > > received this email in error please notify the sender. This email
>> > > message has been swept for the presence of computer viruses.
>> > >
>> > >
>> >
>> > 
>> >
>> >
>> > This e-mail is sent on behalf of Trader Media Group Limited,
>> > Registered
>> > Office: Auto Trader House, Cutbush Park Industrial Estate,
>> > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England 
>> > No.
>> 4768833).
>> > This email and any files transmitted with it are confidential and
>> > may be legally privileged, and intended solely for the use of the
>> > individual or entity to whom they are addressed. If you have
>> > received this email in error please notify the sender. This email
>> > message has been swept for the presence of computer viruses.
>> >
>> >
>>
>> 
>>
>>
>> This e-mail is sent on behalf of Trader Media Group Limited,
>> Registered
>> Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
>> Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
>> This email and any files transmitted with it are confidential and may
>> be legally privileged, and intended solely for the use of the
>> individual or entity to whom they are addressed

Re: web gui specify shard and collection when adding a new core

2012-03-26 Thread Stefan Matheis
Jamie: SOLR-3238 is the current admin-ticket. create a new one for the 
cloud-related options and describe what (and where) you'd like to have :)



On Sunday, March 25, 2012 at 4:25 AM, Jamie Johnson wrote:

> Is there a plan to add the ability to specify the shard and
> collection when adding a core through the enhanced web gui, is there a
> JIRA for this? If not I'd be more than happy to add the request if
> someone can point me to the active JIRA (both 3162 and 2667 are marked
> Resolved).