Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread santamaria2
Very interesting! Thanks for sharing, I'll ponder on it.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995899.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can I get DIH skip fields that match empty text nodes

2012-07-18 Thread Alexandre Rafalovitch
Hello,

I have DIH reading an XML file and getting fields with empty values.
My definition is:


/text here is actual node name, not text() (e.g. )

Right now, I get the field (of type string) with empty value
indexed/stored/returned. Plus, all the copy fields get the empties as
well.

Can I get DIH to skip that field if I don't have any actual text in
it? I can see how to do it with custom transformer, but it seems that
this would be a common problem and I might just be missing a setting
or some XPath secret.

I actually tried [node()],  [text()] and .../text/text() at the end,
but that seems to make the XPathEntityProcessor skip the field all
together.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
I had tried with splitBy for numeric field, but that also did not worked
for me. However I got rid of group_concat and it was all good to go.

Thanks a lot!! I really had a difficult time understanding this behavior.


*Pranav Prakash*

"temet nosce"



On Thu, Jul 19, 2012 at 1:34 AM, Dyer, James wrote:

> Don't you want to specify "splitBy" for the integer field too?
>
> Actually though, you shouldn't need to use GROUP_CONCAT and
> RegexTransformer at all.  DIH is designed to handle "1>many" relations
> between parent and child entities by populating all the child fields as
> multi-valued automatically.  I guess your approach leads to a lot fewer
> rows getting sent from your db to Solr though.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Pranav Prakash [mailto:pra...@gmail.com]
> Sent: Wednesday, July 18, 2012 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: How To apply transformation in DIH for multivalued numeric field?
>
> I have a multivalued integer field and a multivalued string field defined
> in my schema as
>
>  type="integer"
> indexed="true"
> stored="true"
> multiValued="true"
> omitNorms="true" />
>  type="text"
> indexed="true"
> termVectors="true"
> stored="true"
> multiValued="true"
> omitNorms="true" />
>
>
> The DIH entity and field defn for the same goes as
>
>dataSource="app"
>   onError="skip"
>   transformer="RegexTransformer"
>   query="...">
>
>   transformer="RegexTransformer"
> query="SELECT
> group_concat(a.id SEPARATOR ',') AS community_tag_ids,
> group_concat(a.title SEPARATOR ',') AS community_tags
> FROM tags a JOIN tag_dets b ON a.id = b.tag_id
> WHERE b.doc_id = ${document.id}" >
> 
> 
>   
>
> 
>
> The value for field community_tags comes correctly as an array of strings.
> However the value of field community_tag_ids is not proper
>
> 
> [B@390c0a18
> 
>
> I tried chaining NumberFormatTransformer with formatStyle="number" but that
> throws DataImportHandlerException: Failed to apply NumberFormat on column.
> Could it be due to NULL values from database or because the value is not
> proper? How do we handle NULL in this case?
>
>
> *Pranav Prakash*
>
> "temet nosce"
>
>


Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-18 Thread Aaron Daubman
Greetings,

I've been digging in to this for two days now and have come up short -
hopefully there is some simple answer I am just not seeing:

I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as
identically as possible (given deprecations) and indexing the same document.

For most queries the results are very close (scoring within three
significant differences, almost identical positions in results).

However, for certain documents, the scores are very different (causing
these docs to be ranked +/- 25 positions different or more in the results)

In looking at debugQuery output, it seems like this is due to fieldNorm
values being lower for the 3.6.0 instance than the 1.4.1.

(note that for most docs, the fieldNorms are identical)

I have taken the field values for the example below and run them
through /admin/analysis.jsp on each solr instance. Even for the problematic
docs/fields, the results are almost identical. For the example below, the
t_tag values for the problematic doc:
1.4.1: 162 values
3.6.0: 164 values

note that 1/sqrt(162) = 0.07857 ~= fieldNorm for 1.4.1,
however, (1/0.0625)^2 = 256, which is no where near 164

Here is a particular example from 1.4.1:
1.6263733 = (MATCH) fieldWeight(t_tag:soul in 2066419), product of:
   3.8729835 = tf(termFreq(t_tag:soul)=15)
   5.3750753 = idf(docFreq=27619, maxDocs=2194294)
   0.078125 = fieldNorm(field=t_tag, doc=2066419)

And the same from 3.6.0:
1.3042576 = (MATCH) fieldWeight(t_tag:soul in 1977957), product of:
   3.8729835 = tf(termFreq(t_tag:soul)=15)
   5.388126 = idf(docFreq=27740, maxDocs=2232857)
   0.0625 = fieldNorm(field=t_tag, doc=1977957)


Here is the 1.4.1 config for the t_tag field and text type:

  
  
  
  
  
  
  
  
  



And 3.6.0 schema config for the t_tag field and text type:













I at first got distracted by this change between versions:
LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. This
means that terms with a position increment gap of zero do not affect the
norms calculation by default.
However, this doesn't appear to be causing the issue as, according to
analysis.jsp there is no overlap for t_tag...

Can you point me to where these fieldNorm differences are coming from and
why they'd only be happing for a select few documents for which the content
doesn't stand out?

Thank you,
 Aaron


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
Yury,

Thank you so much! That was it. Man, I spent a good long while trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!

-Briggs

On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats  wrote:

> On 7/18/2012 7:11 PM, Briggs Thompson wrote:
> > I have realized this is not specific to SolrJ but to my instance of
> Solr. Using curl to delete by query is not working either.
>
> Can be this: https://issues.apache.org/jira/browse/SOLR-3432
>


Re: Quick Confirmation on LocalSolrQueryRequest close

2012-07-18 Thread Karthick Duraisamy Soundararaj
Put my question wrong.. Excuse me for spamming.. its been a tiring couple
of days and I am almost sleep typing..  Please read the snippet again.

This might be a dumb question. But I would like to confirm.
>
> Will the following snippet cause a index searcher leak and end up in an
> out of memory exception when newsearchers are created?
>
> class myCustomHandler extends SearchHandler {
>  .
>   void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) {
>
>   LocalSolrQueryRequest newReq = new LocalSolrQueryRequest();
>   newReq = req.getCore();
>   .
>   //  newReq.close()Will removing this lead to OOME?
>   }
>
My conviction is yes. But just want to confirm..
>





On Wed, Jul 18, 2012 at 11:04 PM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> This might be a dumb question. But I would like to confirm.
>
> Will the following snippet cause a index searcher leak and end up in an
> out of memory exception when newsearchers are created?
>
> class myCustomHandler extends SearchHandler {
>  .
>   void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) {
>
>   LocalSolrQueryRequest newReq = new LocalSolrQueryRequest();
>   newReq = req.getCore();
>   .
>   newReq.close()
>   }
>
> My conviction is yes. But just want to confirm..
>


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Yury Kats
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
> I have realized this is not specific to SolrJ but to my instance of Solr. 
> Using curl to delete by query is not working either. 

Can be this: https://issues.apache.org/jira/browse/SOLR-3432


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Increasing the polling interval does help. But the requirement is to get a
document indexed and searchable instantly ( sounds like RTS), 30 sec is
acceptable.I need to look at Solr NRT and cloud.

I created a new core to accept daily updates and replicate every 10 sec.
Two other cores with 234 Million documents are configured to replicate only
once a day.
I am feeding all three cores but two big cores are not replicating. While
searching I am running a group.field on my unique id and taking the most
updated one. Right now it looks fine.Every day I am going to delete the
last day's records from the daily update.

I am planning to use rsync for replication, it will be fusion IO to fusion
IO , so hopefully will be very fast. What do you think ?

We use windows service ( written in dot net C#) to feed the data using REST
call. That is really fast , we can feed more than 15 Million data in a day
to two cores easily. I am using solr config autocommit = 5 sec

I could not figure out how I was able to achieve those numbers in my test
environment, all configuration were same except I had lot less memory in
test  ! I am trying to find out what I am missing in other configuration.
My SLES kernel version is different in production, its a 3.0.*  , test was
2.6.* but I do not think that can cause a problem.

Thank you again,
Mou

On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] <
ml-node+s472066n3995861...@n3.nabble.com> wrote:

> Replication will indeed be incremental. But if you commit too often (and
> committing too often a common mistake) then the merging will
> eventually merge everything into new segments and the whole thing will
> be replicated.
>
> Additionally, optimizing (or forceMerge in 4.x) will make a single segment
> and force the entire index to replicate.
>
> You should emphatically _not_ have to have two cores. Solr is built to
> handle replication etc. I suspect your committing too often or some
> other mis-configuration and you're creating a problem for yourself.
>
> Here's what I'd do:
> 1> increase the polling interval to, say, 10 minutes (or however long you
> can
> live with stale data) on the slave.
>
> 2> decrease the commits you're  doing. This could involve the autocommit
> options
> you might have set in solrconfig.xml. It could be your client (don't
> know how you're
> indexing, solrJ?) and the commitWithin parameter. Could be you're
> optimizing (if you
> are, stop it!).
>
> Note that ramBufferSizeMB has no influence on how often things are
> _committed_.
> When this limit is exceeded, the accumulated indexing data is written
> to the currently-open
> segment. Multiple flushes can go to the _same_ segment. The write-once
> nature of
> segments means that after a segment is closed (through a commit), it
> is not changed. But
> a segment that is not closed may be written to multiple times until it's
> closed.
>
> HTH
> Erick
>
> On Wed, Jul 18, 2012 at 1:25 PM, Mou <[hidden 
> email]>
> wrote:
>
> > Hi Eric,
> >
> > I totally agree. That's what I also figured ultimately. One thing I am
> not
> > clear.  The replication is supposed to be incremental ?  But looks like
> it
> > is trying to replicate the whole index. May be I am changing the index
> so
> > frequently, it is triggering auto merge and a full replication ? I am
> > thinking in right direction?
> >
> > I see that when I start the solr search instance before I start feeding
> the
> > solr Index, my searches are fine BUT it is using the old searcher so I
> am
> > not seeing the updates in the result.
> >
> > So now I am trying to change my architecture. I am going to have a core
> > dedicated to receive daily updates, which is going to be 5 million docs
> and
> > size is going to be little less than 5 G, which is small and replication
> > will be faster?
> >
> > I will search both the cores i.e. old data and the daily updates and do
> a
> > field collapsing on my unique id so that I do not return duplicate
> results
> > .I haven't tried grouping results ; so not sure about  the performance.
> Any
> > suggestion ?
> >
> > Eventually I have to use Solr trunk like you suggested.
> >
> > Thank you for your help,
> >
> > On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> bq: This index is only used for searching and being replicated every 7
> sec
> >> from
> >> the master.
> >>
> >> This is a red-flag. 7 second replication times are likely forcing your
> >> app to spend
> >> all its time opening new searchers. Your cached filter queries are
> >> likely rarely being re-used
> >> because they're being thrown away every 7 seconds. This assumes you're
> >> changing your master index frequently.
> >>
> >> If you need near real time, consider Solr trunk and SolrCloud, but
> >> trying to simulate
> >> NRT with very short replication intervals is usually a bad idea.
> >>
> >> A

RE: Could I use Solr to index multiple applications?

2012-07-18 Thread Zhang, Lisheng
Yury and Shashi,

Thanks very much for helps! I am studying the options pointed
out by you (Solr multiple cores and Elasticsearch).

Best regards, Lisheng

-Original Message-
From: Yury Kats [mailto:yuryk...@yahoo.com]
Sent: Tuesday, July 17, 2012 7:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Could I use Solr to index multiple applications?


On 7/17/2012 9:26 PM, Zhang, Lisheng wrote:
> Thanks very much for quick help! Multicore sounds interesting,
> I roughly read the doc, so we need to put each core name into
> Solr config XML, if we add another core and change XML, do we
> need to restart Solr?

You can add/create cores on the fly, without restarting.
See http://wiki.apache.org/solr/CoreAdmin#CREATE


Solr multiple cores activation

2012-07-18 Thread Praful Bagai
I am implementing a search engine with Nutch as web crawler and Solr for
searching. Now,since Nutch has no search-user-interface any more, I came to
know about Ajax-Solr as search-user-interface.

I implemented Ajax-Solr with no hindrance, but during its search operation
its only search under reuters data. If I want to crawl the complete web
,other than reuter's data, using nutch and integrate it with solr,then i
have to replace solr's schema.xml file with nutch's schema.xml file which
will not be according to ajax-solr configuration. By replacing the
schema.xml files, ajax-solr *wont* work!!!

So, I found a solution to this (correct me if i am wrong),ie, to activate
multiple cores which means integrating Solr with nutch in one core(ie
indexing) and using Ajax-Solr in other.

I tried activating multiple cores,ie integrating solr with nutch in one
core and ajax-solr in other, but to *NO luck*. I tried every single thing,
every permutation and combination , but failed to set them up.
I followed these links
1) http://wiki.apache.org/solr/CoreAdmin
2)
http://www.plaidpony.com/blog/post/2011/04/Multicore-SOLR-And-Tomcat-On-Windows-Server-2008-R2.aspx


But they also didnt helped either. Can you please tell how to set them up???
Been stuck up with this for over 2 days nows. Kindly help!!!

Are there any other search-user-interface??


Thanks
Regards

Praful Bagai


SOLR 4 ALPHA /terms /browse

2012-07-18 Thread Nick Koton
When I setup a 2 shard cluster using the example and run it through its
paces, I find two features that do not work as I expect.  Any suggestions on
adjusting my configuration or expectations would be appreciated.

/terms does not return any terms when issued as follows:
http://hostname:8983/solr/terms?terms.fl=name&terms=true&terms.limit=-1&isSh
ard=true&terms.sort=index&terms.prefix=s
but does return reasonable results when distrib is turned off like so
http://hostname:8983/solr/terms?terms.fl=name&terms=true&distrib=false&terms
.limit=-1&isShard=true&terms.sort=index&terms.prefix=s

/browse returns this stack trace to the browser
HTTP ERROR 500

Problem accessing /solr/browse. Reason:

{msg=ZkSolrResourceLoader does not support getConfigDir() - likely, what
you are trying to do is not supported in ZooKeeper
mode,trace=org.apache.solr.common.cloud.ZooKeeperException:
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
trying to do is not supported in ZooKeeper mode
at
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader
.java:99)
at
org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWr
iter.java:117)
at
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter
.java:40)
at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore.
java:1990)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.
java:398)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
276)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119
)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:
192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117
)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
11)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo
nnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpCo
nnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC
onnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet
e(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnectio
n.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketCon
nector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5
34)
at java.lang.Thread.run(Thread.java:662)
,code=500}

Best regards,
Nick Koton





Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Brendan Grainger
Hi Briggs,

I'm not sure about Solr 4.0, but do you need to commit?

> curl http://localhost:8983/solr/coupon/update?commit=true -H "Content-Type: 
> text/xml" --data-binary '*:*'


Brendan


www.kuripai.com

On Jul 18, 2012, at 7:11 PM, Briggs Thompson wrote:

> I have realized this is not specific to SolrJ but to my instance of Solr. 
> Using curl to delete by query is not working either. 
> 
> Running 
> curl http://localhost:8983/solr/coupon/update -H "Content-Type: text/xml" 
> --data-binary '*:*'
> 
> Yields this in the logs:
> INFO: [coupon] webapp=/solr path=/update 
> params={stream.body=*:*} {deleteByQuery=*:*} 
> 0 0
> 
> But the corpus of documents in the core do not change. 
> 
> My solrconfig is pretty barebones at this point, but I attached it in case 
> anyone sees something strange. Anyone have any idea why documents aren't 
> getting deleted?
> 
> Thanks in advance,
> Briggs Thompson
> 
> On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
>  wrote:
> Hello All,
> 
> I am using 4.0 Alpha and running into an issue with indexing using 
> HttpSolrServer (SolrJ). 
> 
> Relevant java code:
> HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
> solrServer.setRequestWriter(new BinaryRequestWriter());
> 
> Relevant Solrconfig.xml content:
>   
>class="solr.BinaryUpdateRequestHandler" />
> 
> Indexing documents works perfectly fine (using addBeans()), however, when 
> trying to do deletes I am seeing issues. I tried to do a 
> solrServer.deleteByQuery("*:*") followed by a commit and optimize, and 
> nothing is deleted. 
> 
> The response from delete request is a "success", and even in the solr logs I 
> see the following:
> INFO: [coupon] webapp=/solr path=/update/javabin 
> params={wt=javabin&version=2} {deleteByQuery=*:*} 0 1
> Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start 
> commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
> 
> 
> I tried removing the binaryRequestWriter and have the request send out in 
> default format, and I get the following error. 
> SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: 
> application/octet-stream  Not in: [application/xml, text/csv, text/json, 
> application/csv, application/javabin, text/xml, application/json]
>   at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
>   at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
>   at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
>   at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:636)
> 
> 
> I thought that an optimize does the same thing as expungeDeletes, but in the 
> log I see expungeDeletes=false. Is there a way to force that using SolrJ?
> 
> Thanks in advance,
> Briggs
> 
> 
> 



Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Erick Erickson
Replication will indeed be incremental. But if you commit too often (and
committing too often a common mistake) then the merging will
eventually merge everything into new segments and the whole thing will
be replicated.

Additionally, optimizing (or forceMerge in 4.x) will make a single segment
and force the entire index to replicate.

You should emphatically _not_ have to have two cores. Solr is built to
handle replication etc. I suspect your committing too often or some
other mis-configuration and you're creating a problem for yourself.

Here's what I'd do:
1> increase the polling interval to, say, 10 minutes (or however long you can
live with stale data) on the slave.

2> decrease the commits you're  doing. This could involve the autocommit options
you might have set in solrconfig.xml. It could be your client (don't
know how you're
indexing, solrJ?) and the commitWithin parameter. Could be you're
optimizing (if you
are, stop it!).

Note that ramBufferSizeMB has no influence on how often things are _committed_.
When this limit is exceeded, the accumulated indexing data is written
to the currently-open
segment. Multiple flushes can go to the _same_ segment. The write-once nature of
segments means that after a segment is closed (through a commit), it
is not changed. But
a segment that is not closed may be written to multiple times until it's closed.

HTH
Erick

On Wed, Jul 18, 2012 at 1:25 PM, Mou  wrote:
> Hi Eric,
>
> I totally agree. That's what I also figured ultimately. One thing I am not
> clear.  The replication is supposed to be incremental ?  But looks like it
> is trying to replicate the whole index. May be I am changing the index so
> frequently, it is triggering auto merge and a full replication ? I am
> thinking in right direction?
>
> I see that when I start the solr search instance before I start feeding the
> solr Index, my searches are fine BUT it is using the old searcher so I am
> not seeing the updates in the result.
>
> So now I am trying to change my architecture. I am going to have a core
> dedicated to receive daily updates, which is going to be 5 million docs and
> size is going to be little less than 5 G, which is small and replication
> will be faster?
>
> I will search both the cores i.e. old data and the daily updates and do a
> field collapsing on my unique id so that I do not return duplicate results
> .I haven't tried grouping results ; so not sure about  the performance. Any
> suggestion ?
>
> Eventually I have to use Solr trunk like you suggested.
>
> Thank you for your help,
>
> On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] <
> ml-node+s472066n3995754...@n3.nabble.com> wrote:
>
>> bq: This index is only used for searching and being replicated every 7 sec
>> from
>> the master.
>>
>> This is a red-flag. 7 second replication times are likely forcing your
>> app to spend
>> all its time opening new searchers. Your cached filter queries are
>> likely rarely being re-used
>> because they're being thrown away every 7 seconds. This assumes you're
>> changing your master index frequently.
>>
>> If you need near real time, consider Solr trunk and SolrCloud, but
>> trying to simulate
>> NRT with very short replication intervals is usually a bad idea.
>>
>> A quick test would be to disable replication for a bit (or lengthen it
>> to, say, 10 minutes)
>>
>> Best
>> Erick
>>
>> On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi <[hidden 
>> email]>
>> wrote:
>>
>> >
>> >> FWIW, when asked at what point one would want to split JVMs and shard,
>> >> on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
>> >> GC cost reasons. You're way above that.
>> >
>> > - his index is 75G, and Grant mentioned RAM heap size; we can use
>> terabytes
>> > of index with 16Gb memory.
>> >
>> >
>> >
>> >
>> >
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
>>  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
>> click
>> here
>> .
>> NAML
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Custom JUnit tests based on SolrTestCaseJ4 fails intermittently.

2012-07-18 Thread Koorosh Vakhshoori
Hi,
  I am trying out the Solr Alpha release against some custom and Junit codes
I have written. I am seeing my custom JUnit tests failing once in a while.
The tests are based on Solr Junit test code where they are extending
SolrTestCaseJ4. My guess is the Randomized Testing coming across some issue
here. However not sure what the source of the problem is. I noticed the
value of 'codec' is null for failed cases, but I am setting the
luceneMatchVersion value in solrconfig.xml as bellow:
  
   
${tests.luceneMatchVersion:LUCENE_CURRENT}
 
  I am including the test outputs for both scenarios here.
  
  Any help or pointer appreciated.
  
  Thanks,
  
  Koorosh
  

Here is the output of Junit test which failes when running it from Eclipse:
  
NOTE: test params are: codec=null, sim=null, locale=null, timezone=(null)
NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_21
(64-bit)/cpus=4,threads=1,free=59414480,total=63242240
NOTE: All tests run in this JVM: [TestDocsHandler]
Jul 18, 2012 3:55:25 PM com.carrotsearch.randomizedtesting.RandomizedRunner
runSuite
SEVERE: Panic: RunListener hook shouldn't throw exceptions.
java.lang.NullPointerException
at
org.apache.lucene.util.RunListenerPrintReproduceInfo.reportAdditionalFailureInfo(RunListenerPrintReproduceInfo.java:159)
at
org.apache.lucene.util.RunListenerPrintReproduceInfo.testRunFinished(RunListenerPrintReproduceInfo.java:104)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:634)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)

Here is the output for the same test where it is successful:

24 T11 oas.SolrTestCaseJ4.initCore initCore
Creating dataDir:
C:\Users\xuser\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084
43 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr
(NoInitialContextEx)
43 T11 oasc.SolrResourceLoader.locateSolrHome using system property
solr.solr.home: solr-gold/solr-extraction
45 T11 oasc.SolrResourceLoader. new SolrResourceLoader for deduced
Solr Home: 'solr-gold/solr-extraction\'
284 T11 oasc.SolrConfig. Using Lucene MatchVersion: LUCENE_40
429 T11 oasc.SolrConfig. Loaded SolrConfig: solrconfig-dow.xml
434 T11 oass.IndexSchema.readSchema Reading Solr Schema
443 T11 oass.IndexSchema.readSchema Schema name=SolvNet Common core
522 T11 oass.IndexSchema.readSchema default search field in schema is
indexed_content
524 T11 oass.IndexSchema.readSchema query parser default operator is AND
525 T11 oass.IndexSchema.readSchema unique key field: id
616 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr
(NoInitialContextEx)
617 T11 oasc.SolrResourceLoader.locateSolrHome using system property
solr.solr.home: solr-gold/solr-extraction
617 T11 oasc.SolrResourceLoader. new SolrResourceLoader for directory:
'solr-gold/solr-extraction\'
618 T11 oasc.CoreContainer. New CoreContainer 994682772
642 T11 oasc.SolrCore. [collection1] Opening new SolrCore at
solr-gold/solr-extraction\,
dataDir=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\
642 T11 oasc.SolrCore. JMX monitoring not detected for core:
collection1
648 T11 oasc.SolrCore.getNewIndexDir WARNING New index directory detected:
old=null
new=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index/
648 T11 oasc.SolrCore.initIndex WARNING [collection1] Solr index directory
'C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index'
doesn't exist. Creating new index...
742 T11 oasc.SolrDeletionPolicy.onCommit SolrDeletionPolicy.onCommit:
commits:num=1

commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@44023756
lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ed5459),segFN=segments_1,generation=1,filenames=[segments_1]
743 T11 oasc.SolrDeletionPolicy.updateCommits newest commit = 1
871 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/javabin:
solr.BinaryUpdateRequestHandler
875 T11 oasc.RequestHandlers.initHandlersFromConfig created standard:
solr.StandardRequestHandler
878 T11 oasc.RequestHandlers.initHandlersFromConfig created /update:
solr.XmlUpdateRequestHandler
878 T11 oasc.RequestHandlers.initHandlersFromConfig created /admin/:
org.apache.solr.handler.admin.AdminHandlers
886 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/extract:
com.synopsys.ies.solr.backend.handler.extraction.SolvNetExtractingRequestHandler
891 T11 oasc.RequestHandlers.initHandlersFromConfig WARNING Multiple
requestHandler registered to the same name: standard ignoring:
org.apache.solr.handler.StandardRequestHandler
892 T11 oasc.RequestHandlers.initHandlersFromConfig created standard:
solr.SearchHandler
892 T11 oasc.RequestHandlers.initHandlersFromConfig created employee:
solr.SearchHandler
892 T11 oasc.RequestHandlers.initHandlersFromConf

Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
I have realized this is not specific to SolrJ but to my instance of Solr.
Using curl to delete by query is not working either.

Running
curl http://localhost:8983/solr/coupon/update -H "Content-Type: text/xml"
--data-binary '*:*'

Yields this in the logs:
INFO: [coupon] webapp=/solr path=/update
params={stream.body=*:*}
{deleteByQuery=*:*} 0 0

But the corpus of documents in the core do not change.

My solrconfig is pretty barebones at this point, but I attached it in case
anyone sees something strange. Anyone have any idea why documents aren't
getting deleted?

Thanks in advance,
Briggs Thompson

On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson <
w.briggs.thomp...@gmail.com> wrote:

> Hello All,
>
> I am using 4.0 Alpha and running into an issue with indexing using
> HttpSolrServer (SolrJ).
>
> Relevant java code:
> HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
> solrServer.setRequestWriter(new BinaryRequestWriter());
>
> Relevant Solrconfig.xml content:
>
>   
>
>class="solr.BinaryUpdateRequestHandler" />
>
> Indexing documents works perfectly fine (using addBeans()), however, when
> trying to do deletes I am seeing issues. I tried to do
> a solrServer.deleteByQuery("*:*") followed by a commit and optimize, and
> nothing is deleted.
>
> The response from delete request is a "success", and even in the solr logs
> I see the following:
>
> INFO: [coupon] webapp=/solr path=/update/javabin
> params={wt=javabin&version=2} {deleteByQuery=*:*} 0 1
> Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start
> commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
>
>
>
> I tried removing the binaryRequestWriter and have the request send out in
> default format, and I get the following error.
>
> SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType:
> application/octet-stream  Not in: [application/xml, text/csv, text/json,
> application/csv, application/javabin, text/xml, application/json]
>
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>  at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>  at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>  at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
>  at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>  at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
>  at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:636)
>
>
> I thought that an optimize does the same thing as expungeDeletes, but in
> the log I see expungeDeletes=false. Is there a way to force that using
> SolrJ?
>
> Thanks in advance,
> Briggs
>
>





  LUCENE_40
  
  

  ${solr.data.dir:}

  

  
 
		${autoCommitTime:0} 
false 

 
		${autoSoftCommitTime:0} 


		${solr.data.dir:}

  

  
  

  true

  
  
   

	solr-data-config.xml

   
  
  
	   
	   ${masterEnabled:false} 
		 commit
		 startup
		 optimize
		 schema.xml,stopwords.txt,synonyms.txt
	   
	   
		 ${slaveEnabled:false}
		 http://${masterServer:localhost}:${port:8983}/solr/${masterCoreName:}/replication
		 ${pollInterval:00:00:15}
	   
  
  
  

  
  
  
  
  
  

  

  solrpingquery


  all

  
  
  
  


   

  

   
  
solr
  





Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 02:39 PM, Richard Frovarp wrote:

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your
configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores
have a defaultOperator of OR. So I don't know where that would be coming
from. However, if I specify a mm of 0, it appears to work just fine.
I've added it as a default parameter to the select handler.

Thanks for pointing me in the right direction.

Richard


Okay, that's wrong. Term boosting isn't working either, and what I did 
above just turns everything into an OR query.


I did figure out the problem, however. In the core that wasn't working, 
one of the query field names wasn't correct. No errors were ever thrown, 
it just made the query behave in a very odd way.


I finally figured it out after debugging each field independent of each 
other.


Re: java.lang.AssertionError: System properties invariant violated.

2012-07-18 Thread Roman Chyla
Thank you! I haven't really understood the LuceneTestCase.classRules
before this.

roman

On Wed, Jul 18, 2012 at 3:11 PM, Chris Hostetter
 wrote:
>
> : I am porting 3x unittests to the solr/lucene trunk. My unittests are
> : OK and pass, but in the end fail because the new rule checks for
> : modifier properties. I know what the problem is, I am creating new
> : system properties in the @beforeClass, but I think I need to do it
> : there, because the project loads C library before initializing tests.
>
> The purpose ot the assertion is to verify that no code being tested is
> modifying system properties -- if you are setting hte properties yourself
> in some @BeforeClass methods, just use System.clearProperty to unset them
> in corrisponding @AfterClass methods
>
>
> -Hoss


RE: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Dyer, James
Don't you want to specify "splitBy" for the integer field too?

Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at 
all.  DIH is designed to handle "1>many" relations between parent and child 
entities by populating all the child fields as multi-valued automatically.  I 
guess your approach leads to a lot fewer rows getting sent from your db to Solr 
though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Pranav Prakash [mailto:pra...@gmail.com] 
Sent: Wednesday, July 18, 2012 2:38 PM
To: solr-user@lucene.apache.org
Subject: How To apply transformation in DIH for multivalued numeric field?

I have a multivalued integer field and a multivalued string field defined
in my schema as





The DIH entity and field defn for the same goes as



 


  



The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper


[B@390c0a18


I tried chaining NumberFormatTransformer with formatStyle="number" but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

"temet nosce"



Re: Searcher Refrence Counts

2012-07-18 Thread Mark Miller
I'd guess the getSearcher call you are making is incrementing the ref count and 
you are not decrementing it?

On Jul 18, 2012, at 12:17 PM, Karthick Duraisamy Soundararaj wrote:

> Hi All, 
>The SolrCore seems to have a reference counted searcher with it. I 
> had to write a customSearchHandler that extends SearchHandler, and I was 
> playing around with it. I did the following change to search handler
> 
> SearchHanlder.java
> --
> handleRequestBody(SolrQueryRequest req,SolrQueryResponse req)
>  {
>System.out.println("Reference count Before Search:" 
> +req.getCore().getSearcher.getRefcount)  //In eclipse ..
>..
> ...
> System.out.println("Reference count After Search :" 
> +req.getCore().getSearcher.getRefcount) // In eclipse
> }
> 
> 
> Now, I am surprised to see Reference count not getting decremented at all. 
> Following is the sample output I get
> 
>Reference count before search:1
>Reference count after search:2
>..
>Reference count before search:2
>Reference count after search:3
>  .
>Reference count before search:4
>Reference count after search:5
>...
>
>Reference count before search:3000
>Reference count after search:30001
> 
> 
> The reference count seems to be increasing. Wouldnt this cause a memory leak?
>
> 
> 
> 
> 
> 

- Mark Miller
lucidimagination.com













Re: DIH XML configs for multi environment

2012-07-18 Thread Pranav Prakash
That approach would work for core dependent parameters. In my case, the
params are environment dependent. I think a simpler approach would be to
pass the url param as JVM options, and these XMLs get it from there.

I haven't tried it yet.

*Pranav Prakash*

"temet nosce"



On Tue, Jul 17, 2012 at 5:09 PM, Markus Klose  wrote:

> Hi
>
> There is one more approach using the property mechanism.
>
> You could specify the datasource like this:
> 
>
>  And you can specifiy the properties in the solr.xml in your core
> configuration like this:
>
> 
> 
> 
> 
>
>
> Viele Grüße aus Augsburg
>
> Markus Klose
> SHI Elektronische Medien GmbH
>
>
> Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg
>
> Tel.:   0821 7482633 26
> Tel.:   0821 7482633 0 (Zentrale)
> Mobil:0176 56516869
> Fax:   0821 7482633 29
>
> E-Mail: markus.kl...@shi-gmbh.com
> Internet: http://www.shi-gmbh.com
>
> Registergericht Augsburg HRB 17382
> Geschäftsführer: Peter Spiske
> USt.-ID: DE 182167335
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Gesendet: Mittwoch, 11. Juli 2012 11:21
> An: solr-user@lucene.apache.org
> Betreff: Re: DIH XML configs for multi environment
>
> http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
> http://docs.codehaus.org/display/JETTY/DataSource+Examples
>
>
> On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash  wrote:
>
> > That's cool. Is there something similar for Jetty as well? We use Jetty!
> >
> > *Pranav Prakash*
> >
> > "temet nosce"
> >
> >
> >
> > On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar <
> > rahul.warawde...@gmail.com> wrote:
> >
> > > Hi Pranav,
> > >
> > > If you are using Tomcat to host Solr, you can define your data
> > > source in context.xml file under tomcat configuration.
> > > You have to refer to this datasource with the same name in all the 3
> > > environments from DIH data-config.xml.
> > > This context.xml file will vary across 3 environments having
> > > different credentials for dev, stag and prod.
> > >
> > > eg
> > > DIH data-config.xml will refer to the datasource as listed below
> > >  > > type="JdbcDataSource" readOnly="true" />
> > >
> > > context.xml file which is located under "//conf" folder
> > > will have the resource entry as follows
> > >> > type="" username="X" password="X"
> > > driverClassName=""
> > > url=""
> > > maxActive="8"
> > > />
> > >
> > > On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash 
> > wrote:
> > >
> > > > The DIH XML config file has to be specified dataSource. In my
> > > > case, and possibly with many others, the logon credentials as well
> > > > as mysql
> > server
> > > > paths would differ based on environments (dev, stag, prod). I
> > > > don't
> > want
> > > to
> > > > end up coming with three different DIH config files, three
> > > > different handlers and so on.
> > > >
> > > > What is a good way to deal with this?
> > > >
> > > >
> > > > *Pranav Prakash*
> > > >
> > > > "temet nosce"
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards
> > > Rahul A. Warawdekar
> > >
> >
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>


Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores 
have a defaultOperator of OR. So I don't know where that would be coming 
from. However, if I specify a mm of 0, it appears to work just fine. 
I've added it as a default parameter to the select handler.


Thanks for pointing me in the right direction.

Richard


How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
I have a multivalued integer field and a multivalued string field defined
in my schema as





The DIH entity and field defn for the same goes as



 


  



The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper


[B@390c0a18


I tried chaining NumberFormatTransformer with formatStyle="number" but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

"temet nosce"


Re: java.lang.AssertionError: System properties invariant violated.

2012-07-18 Thread Chris Hostetter

: I am porting 3x unittests to the solr/lucene trunk. My unittests are
: OK and pass, but in the end fail because the new rule checks for
: modifier properties. I know what the problem is, I am creating new
: system properties in the @beforeClass, but I think I need to do it
: there, because the project loads C library before initializing tests.

The purpose ot the assertion is to verify that no code being tested is 
modifying system properties -- if you are setting hte properties yourself 
in some @BeforeClass methods, just use System.clearProperty to unset them 
in corrisponding @AfterClass methods


-Hoss


Solr grouping / facet query

2012-07-18 Thread s215903406
Could anyone suggest the options available to handle the following situation:

1. Say we have 1,000 authors

2. 65% of these authors have 10-100 titles they authored; the others have
not authored any titles but provide only their biography and writing
capability. 

3. We want to search for authors, group the results by author, and show the
4 most relevant titles authored for each (if any) next to the author name.

Since not all authors have titles authored, I can't group titles by author.
Also, adding their bio to each title places a lot of duplicate data in the
index. 

So the search results would look like this;

Author A
title0, title6, title8, title3

Author G
no titles found

Author E
title4, title9, title2

Any suggestions would be appreciated!

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr faceting -- sort order

2012-07-18 Thread Christopher Gross
I have a "keyword" field type that I made:














When I do a query, the results that come through retain their original
case for this field, like:
doc 1
keyword: Blah Blah Blah
doc 2
keyword: Yadda Yadda Yadda

But when I pull back facets, i get:

blah blah blah (1)
yadda yadda yadda (1)

I was attempting to fix a sorting problem -- keyword "" would show
up after keyword "Zulu" due to the "index" sorting, so I thought that
I could lowercase it all to have it be in the same order.  But now it
is all in lower case, and I'd like it to retain the original style.
Is there a different sort that I should use, or is there a change that
I can make to my keyword type that would let the facet count list show
up alphabetically, but ignoring case.

Thanks!

-- Chris


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Hi Eric,

I totally agree. That's what I also figured ultimately. One thing I am not
clear.  The replication is supposed to be incremental ?  But looks like it
is trying to replicate the whole index. May be I am changing the index so
frequently, it is triggering auto merge and a full replication ? I am
thinking in right direction?

I see that when I start the solr search instance before I start feeding the
solr Index, my searches are fine BUT it is using the old searcher so I am
not seeing the updates in the result.

So now I am trying to change my architecture. I am going to have a core
dedicated to receive daily updates, which is going to be 5 million docs and
size is going to be little less than 5 G, which is small and replication
will be faster?

I will search both the cores i.e. old data and the daily updates and do a
field collapsing on my unique id so that I do not return duplicate results
.I haven't tried grouping results ; so not sure about  the performance. Any
suggestion ?

Eventually I have to use Solr trunk like you suggested.

Thank you for your help,

On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] <
ml-node+s472066n3995754...@n3.nabble.com> wrote:

> bq: This index is only used for searching and being replicated every 7 sec
> from
> the master.
>
> This is a red-flag. 7 second replication times are likely forcing your
> app to spend
> all its time opening new searchers. Your cached filter queries are
> likely rarely being re-used
> because they're being thrown away every 7 seconds. This assumes you're
> changing your master index frequently.
>
> If you need near real time, consider Solr trunk and SolrCloud, but
> trying to simulate
> NRT with very short replication intervals is usually a bad idea.
>
> A quick test would be to disable replication for a bit (or lengthen it
> to, say, 10 minutes)
>
> Best
> Erick
>
> On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi <[hidden 
> email]>
> wrote:
>
> >
> >> FWIW, when asked at what point one would want to split JVMs and shard,
> >> on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
> >> GC cost reasons. You're way above that.
> >
> > - his index is 75G, and Grant mentioned RAM heap size; we can use
> terabytes
> > of index with 16Gb memory.
> >
> >
> >
> >
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
>  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
> click
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Start solr master and solr slave with enable replication = false

2012-07-18 Thread Erick Erickson
See: 
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

I'll admit that I haven't tried this personally, but I think it'll work.

Although I'm pretty sure that if you just disable the master,
disabling the polling on the slave isn't necessary.

Best
Erick

On Wed, Jul 18, 2012 at 6:24 AM, Jamel ESSOUSSI
 wrote:
> Hi,
>
> It's possible to start the solr master and slave with the following
> configuration
>
> - replication on master disabled when we start solr --> the replication
> feature must be available
> - polling on slave disabled --> the replication feature must be available
>
>
> -- Best Regards
> -- Jamel
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread Erick Erickson
But I did run across an idea a while ago... Either with a custom
update processor
or on the client side, you permute the title so you index something like:
Shadows of the Damned
of the Damned&Shadows
the Damned&Shadows of
Damned&Shadows of the

Index these with KeywordTokenizer and LowercaseFilter.

Now, your responses from TermComponent (prefix) contain the entire
string and you can display them correctly by rearranging the string
at the client side based on the & (or whatever delimiter). Still an issue
with proper capitalization though since TermsComponent only
looks at the actual indexed data and it'll be lower-cased. You could
use String, but then you're counting on the user to capitalize properly, always
a dicey call.

And TermsComponent is very fast

FWIW
Erick

On Wed, Jul 18, 2012 at 9:21 AM, santamaria2  wrote:
> Well silly me... you're right.
>
> On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] <
> ml-node+s472066n399570...@n3.nabble.com> wrote:
>
>> Well, option 2 won't do you any good, so speed doesn't really matter.
>> Your response would have a facet count for "dam", all by itself, something
>> like
>>
>> 2
>> 1
>>
>> etc.
>>
>> which does not contain anything that lets you reconstruct the title
>> for autosuggest.
>>
>> Best
>> Erick
>>
>> On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 <[hidden 
>> email]>
>> wrote:
>> > I'll consider using the other methods, but I'd like to know which would
>> be
>> > faster among the two approaches mentioned in my opening post.
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html
>>
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html
>>  To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click
>> here
>> .
>> NAML
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Erick Erickson
bq: This index is only used for searching and being replicated every 7 sec from
the master.

This is a red-flag. 7 second replication times are likely forcing your
app to spend
all its time opening new searchers. Your cached filter queries are
likely rarely being re-used
because they're being thrown away every 7 seconds. This assumes you're
changing your master index frequently.

If you need near real time, consider Solr trunk and SolrCloud, but
trying to simulate
NRT with very short replication intervals is usually a bad idea.

A quick test would be to disable replication for a bit (or lengthen it
to, say, 10 minutes)

Best
Erick

On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi  wrote:
>
>> FWIW, when asked at what point one would want to split JVMs and shard,
>> on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
>> GC cost reasons. You're way above that.
>
> - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes
> of index with 16Gb memory.
>
>
>
>
>


Re: edismax not working in a core

2012-07-18 Thread Erick Erickson
the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has
a mm parameter set on the core that isn't doing what you want..

Best
Erick

On Tue, Jul 17, 2012 at 3:05 PM, Richard Frovarp  wrote:
> On 07/14/2012 05:32 PM, Erick Erickson wrote:
>>
>> Really hard to say. Try executing your query on the cores with
>> &debugQuery=on and compare the parsed results (for this you
>> can probably just ignore the explain bits of the output, concentrate
>> on the parsed query).
>>
>
> Okay, for the example core from the project, the query was:
>
> test OR samsung
>
> parsedquery:
> +(DisjunctionMaxQuery((id:test^10.0 | text:test^0.5 | cat:test^1.4 |
> manu:test^1.1 | name:test^1.2 | features:test | sku:test^1.5))
> DisjunctionMaxQuery((id:samsung^10.0 | text:samsung^0.5 | cat:samsung^1.4 |
> manu:samsung^1.1 | name:samsung^1.2 | features:samsung | sku:samsung^1.5)))
>
> For my core the query was:
>
> frovarp OR fee
>
> parsedquery:
>
> +((DisjunctionMaxQuery((content:fee | title:fee^5.0 | mainContent:fee^2.0))
> DisjunctionMaxQuery((content:frovarp | title:frovarp^5.0 |
> mainContent:frovarp^2.0)))~2)
>
> What is that ~2? That's the difference. The third core that works properly
> also doesn't have the ~2.


Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
Have you tried the analysis window to debug.

I believe you are doing something wrong in the fieldType.

On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar  wrote:

> Thanks Sahi. I have replaced my EdgeNGramFilterFactory to
> NGramFilterFactory as I need substrings not just in front or back but
> anywhere.
> You are right I put the same NGramFilterFactory in both Query and Index
> however now it does not return any results not even the basic one.
>
> -Original Message-
> From: Dikchant Sahi [mailto:contacts...@gmail.com]
> Sent: Wednesday, July 18, 2012 7:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: NGram for misspelt words
>
> You are creating grams only while indexing and not querying hence 'ludlwo'
> would not match. Your analyzer will create the following grams while
> indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
> to 'ludlwo'.
>
> Either you need to create gram while querying also or use Edit Distance.
>
> On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar 
> wrote:
>
> >
> >
> >
> > I have configured NGram Indexing for some fields.
> >
> > Say I search for the city Ludlow, I get the results (normal search)
> >
> > If I search for Ludlo (with w ommitted) I get the results
> >
> > If I search for Ludl (with ow ommitted) I still get the results
> >
> > I know that they are all partial strings of the main string hence
> > NGram works perfect.
> >
> > But when I type in Ludlwo (misspelt, characters o and w interchanged)
> > I dont get any results, It should ideally match "Ludl" and provide the
> > results.
> >
> > I am not looking for Edit distance based Spell Correctors. How can I
> > make above NGram based search work?
> >
> > Here is my schema.xml (NGramFieldType):
> >
> >  > stored="false" multiValued="true">
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  > maxGramSize="15" side="front" />
> >
> >
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >
> > 
> > 
> > **
> > This message may contain confidential or
> > proprietary information intended only for the use of
> > theaddressee(s) named above or may contain information that is
> > legally privileged. If you arenot the intended addressee, or the
> > person responsible for delivering it to the intended addressee,you
> > are hereby notified that reading, disseminating, distributing or
> > copying this message is strictlyprohibited. If you have received
> > this message by mistake, please immediately notify us byreplying
> > to the message and delete the original message and any copies
> > immediately thereafter.  Thank you.~
> >
> > **
> > 
> > FAFLD
> > 
> >
>


RE: NGram for misspelt words

2012-07-18 Thread Husain, Yavar
Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as 
I need substrings not just in front or back but anywhere.
You are right I put the same NGramFilterFactory in both Query and Index however 
now it does not return any results not even the basic one.

-Original Message-
From: Dikchant Sahi [mailto:contacts...@gmail.com] 
Sent: Wednesday, July 18, 2012 7:54 PM
To: solr-user@lucene.apache.org
Subject: Re: NGram for misspelt words

You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while indexing 
for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar  wrote:

>
>
>
> I have configured NGram Indexing for some fields.
>
> Say I search for the city Ludlow, I get the results (normal search)
>
> If I search for Ludlo (with w ommitted) I get the results
>
> If I search for Ludl (with ow ommitted) I still get the results
>
> I know that they are all partial strings of the main string hence 
> NGram works perfect.
>
> But when I type in Ludlwo (misspelt, characters o and w interchanged) 
> I dont get any results, It should ideally match "Ludl" and provide the 
> results.
>
> I am not looking for Edit distance based Spell Correctors. How can I 
> make above NGram based search work?
>
> Here is my schema.xml (NGramFieldType):
>
>  stored="false" multiValued="true">
>
> 
>
> 
>
> 
>
> 
>
>  maxGramSize="15" side="front" />
>
>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
> 
> 
> **
> This message may contain confidential or 
> proprietary information intended only for the use of 
> theaddressee(s) named above or may contain information that is 
> legally privileged. If you arenot the intended addressee, or the 
> person responsible for delivering it to the intended addressee,you 
> are hereby notified that reading, disseminating, distributing or 
> copying this message is strictlyprohibited. If you have received 
> this message by mistake, please immediately notify us byreplying 
> to the message and delete the original message and any copies 
> immediately thereafter.  Thank you.~
>
> **
> 
> FAFLD
> 
>


Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while
indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar  wrote:

>
>
>
> I have configured NGram Indexing for some fields.
>
> Say I search for the city Ludlow, I get the results (normal search)
>
> If I search for Ludlo (with w ommitted) I get the results
>
> If I search for Ludl (with ow ommitted) I still get the results
>
> I know that they are all partial strings of the main string hence NGram
> works perfect.
>
> But when I type in Ludlwo (misspelt, characters o and w interchanged) I
> dont get any results, It should ideally match "Ludl" and provide the
> results.
>
> I am not looking for Edit distance based Spell Correctors. How can I make
> above NGram based search work?
>
> Here is my schema.xml (NGramFieldType):
>
>  stored="false" multiValued="true">
>
> 
>
> 
>
> 
>
> 
>
>  maxGramSize="15" side="front" />
>
>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
> 
> 
> **This
> message may contain confidential or proprietary information intended only
> for the use of theaddressee(s) named above or may contain information
> that is legally privileged. If you arenot the intended addressee, or
> the person responsible for delivering it to the intended addressee,you
> are hereby notified that reading, disseminating, distributing or copying
> this message is strictlyprohibited. If you have received this message
> by mistake, please immediately notify us byreplying to the message and
> delete the original message and any copies immediately thereafter.
> 
> Thank you.~
>
> **
> FAFLD
> 
>


Count is inconsistent between facet and stats

2012-07-18 Thread Yandong Yao
Hi Guys,

Steps to reproduce:

1) Download apache-solr-4.0.0-ALPHA
2) cd example;  java -jar start.jar
3) cd exampledocs;  ./post.sh *.xml
4) Use statsComponent to get the stats info for field 'popularity' based on
facet 'cat'.  And the 'count' for 'electronics' is 3
http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat

{

   - stats_fields:
   {
  - popularity:
  {
 - min: 0,
 - max: 10,
 - count: 14,
 - missing: 0,
 - sum: 75,
 - sumOfSquares: 503,
 - mean: 5.357142857142857,
 - stddev: 2.7902892835178013,
 - facets:
 {
- cat:
{
   - music:
   {
  - min: 10,
  - max: 10,
  - count: 1,
  - missing: 0,
  - sum: 10,
  - sumOfSquares: 100,
  - mean: 10,
  - stddev: 0
  },
   - monitor:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - hard drive:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - scanner:
   {
  - min: 6,
  - max: 6,
  - count: 1,
  - missing: 0,
  - sum: 6,
  - sumOfSquares: 36,
  - mean: 6,
  - stddev: 0
  },
   - memory:
   {
  - min: 0,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 74,
  - mean: 4,
  - stddev: 3.605551275463989
  },
   - graphics card:
   {
  - min: 7,
  - max: 7,
  - count: 2,
  - missing: 0,
  - sum: 14,
  - sumOfSquares: 98,
  - mean: 7,
  - stddev: 0
  },
   - electronics:
   {
  - min: 1,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 9,
  - sumOfSquares: 51,
  - mean: 3,
  - stddev: 3.4641016151377544
  }
   }
}
 }
  }

}
5)  Facet on 'cat' and the count is 14.
http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat

{

   - cat:
   [
  - "electronics",
  - 14,
  - "memory",
  - 3,
  - "connector",
  - 2,
  - "graphics card",
  - 2,
  - "hard drive",
  - 2,
  - "monitor",
  - 2,
  - "camera",
  - 1,
  - "copier",
  - 1,
  - "multifunction printer",
  - 1,
  - "music",
  - 1,
  - "printer",
  - 1,
  - "scanner",
  - 1,
  - "currency",
  - 0,
  - "search",
  - 0,
  - "software",
  - 0
  ]

},



So from StatsComponent the count for 'electronics' cat is 3, while
FacetComponent report 14 'electronics'. Is this a bug?

Following is the field definition for 'cat'.


Thanks,
Yandong


NGram for misspelt words

2012-07-18 Thread Husain, Yavar



I have configured NGram Indexing for some fields.

Say I search for the city Ludlow, I get the results (normal search)

If I search for Ludlo (with w ommitted) I get the results

If I search for Ludl (with ow ommitted) I still get the results

I know that they are all partial strings of the main string hence NGram works 
perfect.

But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont 
get any results, It should ideally match "Ludl" and provide the results.

I am not looking for Edit distance based Spell Correctors. How can I make above 
NGram based search work?

Here is my schema.xml (NGramFieldType):
































**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread santamaria2
Well silly me... you're right.

On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] <
ml-node+s472066n399570...@n3.nabble.com> wrote:

> Well, option 2 won't do you any good, so speed doesn't really matter.
> Your response would have a facet count for "dam", all by itself, something
> like
>
> 2
> 1
>
> etc.
>
> which does not contain anything that lets you reconstruct the title
> for autosuggest.
>
> Best
> Erick
>
> On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 <[hidden 
> email]>
> wrote:
> > I'll consider using the other methods, but I'd like to know which would
> be
> > faster among the two approaches mentioned in my opening post.
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html
>  To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread Erick Erickson
Well, option 2 won't do you any good, so speed doesn't really matter.
Your response would have a facet count for "dam", all by itself, something like

2
1

etc.

which does not contain anything that lets you reconstruct the title
for autosuggest.

Best
Erick

On Tue, Jul 17, 2012 at 3:18 AM, santamaria2  wrote:
> I'll consider using the other methods, but I'd like to know which would be
> faster among the two approaches mentioned in my opening post.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Does SolrEntityProcessor fulfill my requirements?

2012-07-18 Thread Vadim Kisselmann
Hi folks,

i have this case:
i want to update my solr 4.0 from trunk to solr 4.0 alpha. the index
structure has changed, i can't replicate.
10 cores are in use, each with 30Mio docs. We assume that all fields
are stored and indexed.
What is the best way to export the docs from all cores on one machine
with solr 4.0trunk to same named cores on other machine with solr 4.0
alpha.
SolrEntityProcessor can be one solution, but does it work with this
size of data? I want reindex all docs at once and not in "small"
parts. I find no examples
of bigger reindex-attempts with SolrEntityProcessor.
Xslt as option two?
What were the best solution to do this, what do you think?

Best Regards
Vadim


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-18 Thread solrman
Nick,

to solve out of memory issue, i think you can make below change:
1) in solrsconfig.xml, reduce ramBufferSizeMB (there are two, change both)
2) in solrsconfig.xml, reduce documentCache value

to solve call commit slow down index issue, i think you can change new
search default queyr:
in solrsconfig.xml, search for 

change 
content:* 0 10
to
content:notexist 0 10

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-Alpha-Out-Of-Mem-Err-tp3995033p3995695.html
Sent from the Solr - User mailing list archive at Nabble.com.


change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
Dear developers,

while upgrading from 3.6.x to 4.x I have to rewrite some of my code and
search for the new methods and/or classes. In 3.6.x and older versions
the API Javadoc interface had an "Index" which made it easy to find the
appropriate methods. The button to call the "Index" was located in the
top of the web page between Deprecated and Help.

What is the sense of removing the "Index" from the API Javadoc for Lucene and 
Solr?

Regards
Bernd


Re: Problems with elevation component configuration

2012-07-18 Thread igors
Hi,

Well, if I understand correctly, only the search term is important for
elevation, not the query.

Anyway, we ended up modifying QueryElevationComponent class, extracting the
search term from the query using regex.
After that, it turned out that elevation doesn't work with grouped results,
so we had to separate sorting for groups and non-groups in prepare() method
of the same class.
That was not the end of problems, because we need to show elevated results
with a different styling, so we upgraded to Solr4 and now it seems to be
working as expected.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-tp3993204p3995692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Start solr master and solr slave with enable replication = false

2012-07-18 Thread Jamel ESSOUSSI
Hi,

It's possible to start the solr master and slave with the following
configuration

- replication on master disabled when we start solr --> the replication
feature must be available 
- polling on slave disabled --> the replication feature must be available 


-- Best Regards
-- Jamel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-18 Thread Yonik Seeley
I think what makes the most sense is to limit the number of
connections to another host.  A host only has so many CPU resources,
and beyond a certain point throughput would start to suffer anyway
(and then only make the problem worse).  It also makes sense in that a
client could generate documents faster than we can index them (either
for a short period of time, or on average) and having flow control to
prevent unlimited buffering (which is essentially what this is) makes
sense.

Nick - when you switched to HttpSolrServer, things worked because this
added an explicit flow control mechanism.
A single request (i.e. an add with one or more documents) is fully
indexed to all endpoints before the response is returned.  Hence if
you have 10 indexing threads and are adding documents in batches of
100, there can be only 1000 documents buffered in the system at any
one time.

-Yonik
http://lucidimagination.com


solr indexing on HDFS for high query throughput

2012-07-18 Thread vineet yadav
Hi,
I am using solr for indexing. Index size is small and it is around
50GB. I need to use solr for high query throughput system. I am using
twitter api  and I need to search incoming tweet in solr. So I want to
know how should I design such system ? Does solr supports HDFS
natively ? How can I index and search on HDFS system ?
Thanks
Vineet Yadav