Re: Solr Memory Usage

2010-12-14 Thread Toke Eskildsen
On Tue, 2010-12-14 at 06:07 +0100, Cameron Hurst wrote:

[Cameron expected 150MB overhead]

> As I start to index data and passing queries to the database I notice a
> steady rise in the RAM but it doesn't stop at 150MB. If I continue to
> reindex the exact same data set with no additional data entries the RAM
> continuously increases. I stopped looking as the RAM increased beyond
> 350MB and started to try and debug it and can't find anything obvious
> from my beginners view point.

The JVM tries to find a balance between garbage collection and memory
usage. If your memory allocation is 512MB, it makes perfectly sense for
the JVM to allocate 350 MB, although it might be capable of getting by
with 150MB.

If you want to find a "minimum" (or rather the smallest possible memory
allocation that does not result in excessive performance degradation),
you'll have to start with different maximum memory allocations and
measure. Do keep a close eye on garbage collection time when you do
this. I recommend visualvm with the Visual GC plugin for this.



Re: Query performance very slow even after autowarming

2010-12-14 Thread johnnyisrael

Hi Chris,

Thanks for looking into it.

Here is the sample query.

http://localhost:8080/solr/core0/select/?qt=autosuggest&q=a

I am using a request handler with a name autosuggest with the following
configuration.

  
 
   json
   name,score
   score desc
   true
   
   

The debug timing for the above query is as follows.

"timing":{
"time":.0,
"prepare":{
 "time":0.0,
 "org.apache.solr.handler.component.QueryComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.StatsComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.DebugComponent":{
  "time":0.0}},
"process":{
 "time":.0,
 "org.apache.solr.handler.component.QueryComponent":{
  "time":3332.0},
 "org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.StatsComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.DebugComponent":{
"time":1.0}

This is not at all consistent, sometimes it is happening and sometimes not.

Thanks,

Johnny

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-performance-very-slow-even-after-autowarming-tp2010384p2084263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Google like search

2010-12-14 Thread satya swaroop
Hi All,
 Can we get the results like google  having some data  about the
search... I was able to get the data that is the first 300 characters of a
file, but it is not helpful for me, can i be get the data that is having the
first found key in that file

Regards,
Satya


Re: Google like search

2010-12-14 Thread Tanguy Moal
Hi Satya,

I think what you'e looking for is called "highlighting" in the sense
of "highlighting" the query terms in their matching context.

You could start by googling "solr highlight", surely the first results
will make sense.

Solr's wiki results are usually a good entry point :
http://wiki.apache.org/solr/HighlightingParameters .

Maybe I misunderstood your question, but I hope that'll help...

Regards,

Tanguy


2010/12/14 satya swaroop :
> Hi All,
>         Can we get the results like google  having some data  about the
> search... I was able to get the data that is the first 300 characters of a
> file, but it is not helpful for me, can i be get the data that is having the
> first found key in that file
>
> Regards,
> Satya
>


Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy,
  I am not asking for highlighting.. I think it can be
explained with an example.. Here i illustarte it::

when i post the query like dis::

http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on

i Would be getting the result as follows::

-
-
0
1

-
-
Java%20debugging.pdf
122
-
-
Table of Contents
If you're viewing this document online, you can click any of the topics
below to link directly to that section.
1. Tutorial tips 2
2. Introducing debugging  4
3. Overview of the basics 6
4. Lessons in client-side debugging 11
5. Lessons in server-side debugging 15
6. Multithread debugging 18
7. Jikes overview 20






Here the str field contains the first 300 characters of the file as i kept a
field to copy only 300 characters in schema.xml...
But i dont want the content like dis.. Is there any way to make an o/p as
follows::

 Java is one of the best language,java is easy to learn...


where this content is at start of the chapter,where the first word of java
is occured in the file...


Regards,
Satya


Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Where did you put the jar?

> All,
> 
> Can anyone shed some light on this error. I can't seem to get this
> class to load. I am using the distribution of Solr from Lucid
> Imagination and the Spatial Plugin from here
> https://issues.apache.org/jira/browse/SOLR-773. I don't know how to
> apply a patch but the jar file is in there. What else can I do?
> 
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin'
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
> 373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
> org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
> org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
>   at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1442)
>   at org.apache.solr.core.SolrCore.(SolrCore.java:548)
>   at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.ja
> va:137) at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83
> ) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
> at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:59
> 4) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:121
> 8) at
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
> at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
> at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
> :147) at
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerC
> ollection.java:161) at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
> :147) at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
> at org.mortbay.jetty.Server.doStart(Server.java:210)
>   at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.mortbay.start.Main.invokeMain(Main.java:183)
>   at org.mortbay.start.Main.start(Main.java:497)
>   at org.mortbay.start.Main.main(Main.java:115)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
> 357) ... 33 more


Is there a way to view the values of "stored=false" fields in search results?

2010-12-14 Thread Swapnonil Mukherjee
Hi All,

I have setup certain fields to be indexed=true and stored=false. According to 
the documentation fields marked as stored=false do not appear in search 
results, which is perfectly ok.

But now I have a situation where I need to debug to see the value of these 
fields.
So is there a way to see the value of stored=false fields?

With Regards
Swapnonil Mukherjee


Re: Is there a way to view the values of "stored=false" fields in search results?

2010-12-14 Thread Ahmet Arslan
> But now I have a situation where I need to debug to see the
> value of these fields.
> So is there a way to see the value of stored=false fields?

You cannot see the original values. But you can see what is indexed. 
http://www.getopt.org/luke/ can display it.


  


RE: Google like search

2010-12-14 Thread Dave Searle
Highlighting is exactly what you need, although if you highlight the whole 
book, this could slow down your queries. Index/store the first 5000-1 
characters and see how you get on

-Original Message-
From: satya swaroop [mailto:satya.yada...@gmail.com] 
Sent: 14 December 2010 10:08
To: solr-user@lucene.apache.org
Subject: Re: Google like search

Hi Tanguy,
  I am not asking for highlighting.. I think it can be
explained with an example.. Here i illustarte it::

when i post the query like dis::

http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on

i Would be getting the result as follows::

-
-
0
1

-
-
Java%20debugging.pdf
122
-
-
Table of Contents
If you're viewing this document online, you can click any of the topics
below to link directly to that section.
1. Tutorial tips 2
2. Introducing debugging  4
3. Overview of the basics 6
4. Lessons in client-side debugging 11
5. Lessons in server-side debugging 15
6. Multithread debugging 18
7. Jikes overview 20






Here the str field contains the first 300 characters of the file as i kept a
field to copy only 300 characters in schema.xml...
But i dont want the content like dis.. Is there any way to make an o/p as
follows::

 Java is one of the best language,java is easy to learn...


where this content is at start of the chapter,where the first word of java
is occured in the file...


Regards,
Satya


Re: Query-Expansion, copyFields, flexibility and size of Index (Solr-3.1-SNAPSHOT)

2010-12-14 Thread mdz-munich

Okay, I start guessing:

- Do we have to write a customized QueryParserPlugin?
- On which point does the RequestHandler/QueryParser/whatever decide what
query-analyzer to use?

10% for every copied field is a lot for us, we're facing Terra-bytes of
digitized Book-Data. So we want to keep the index simple, small and flexible
and just append IR-Functionalities on Query-Time.   

Greetings & thank you,

Sebastian
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Expansion-copyFields-flexibility-and-size-of-Index-Solr-3-1-SNAPSHOT-tp2078573p2085018.html
Sent from the Solr - User mailing list archive at Nabble.com.


De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Jason Brown
I have configured de-duplication according to the Wiki..

My signature field is defined thus...  



and my updateRequestProcessor as follows



  true
  false
  signature
  content
  org.apache.solr.update.processor.Lookup3Signature



  

I am using SOLRJ to write to the index with the binary (as opposed to XML) so 
my update handler is defined as below.

 

  dedupe

  

However I was expecting SOLR to only allow 1 instance of a duplicate document 
into the index, but I get the following results when I query mt index...

I have deliberately added my ISA Letter file 4 times and can see it has 
correctly generated an identical signature for the first 4 entries 
(d91a5ce933457fd5). The fifth entry is a different document and correctly has a 
different signature. 

I was expecting to only see 1 instance of the duplicate. Am I misinterpreting 
the way it works? Many Thanks.


?

ISA Letter
d91a5ce933457fd5

?

ISA Letter
d91a5ce933457fd5

?

ISA Letter
d91a5ce933457fd5

?

ISA Letter
d91a5ce933457fd5

?

ISA Mailing pack letter
fd9d9e1c0de32fb5


If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Markus Jelsma
Check this setting:
  false


On Tuesday 14 December 2010 14:26:21 Jason Brown wrote:
> I have configured de-duplication according to the Wiki..
> 
> My signature field is defined thus...
> 
>  multiValued="false" />
> 
> and my updateRequestProcessor as follows
> 
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> true
>   false
>   signature
>   content
>name="signatureClass">org.apache.solr.update.processor.Lookup3Signature tr> 
> 
> 
>   
> 
> I am using SOLRJ to write to the index with the binary (as opposed to XML)
> so my update handler is defined as below.
> 
>   class="solr.BinaryUpdateRequestHandler" > 
>   dedupe
> 
>   
> 
> However I was expecting SOLR to only allow 1 instance of a duplicate
> document into the index, but I get the following results when I query mt
> index...
> 
> I have deliberately added my ISA Letter file 4 times and can see it has
> correctly generated an identical signature for the first 4 entries
> (d91a5ce933457fd5). The fifth entry is a different document and correctly
> has a different signature.
> 
> I was expecting to only see 1 instance of the duplicate. Am I
> misinterpreting the way it works? Many Thanks.
> 
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Mailing pack letter
> fd9d9e1c0de32fb5
> 
> 
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Anyway, try putting the jar in 
work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/ 


On Tuesday 14 December 2010 11:10:47 Markus Jelsma wrote:
> Where did you put the jar?
> 
> > All,
> > 
> > Can anyone shed some light on this error. I can't seem to get this
> > class to load. I am using the distribution of Solr from Lucid
> > Imagination and the Spatial Plugin from here
> > https://issues.apache.org/jira/browse/SOLR-773. I don't know how to
> > apply a patch but the jar file is in there. What else can I do?
> > 
> > org.apache.solr.common.SolrException: Error loading class
> > 'org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin'
> > 
> > at
> > 
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java
> > : 373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
> > at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
> > 
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
> > at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1442)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:548)
> > at
> > 
> > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.j
> > a va:137) at
> > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:8
> > 3 ) at
> > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:5
> > 9 4) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
> > at
> > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1
> > 21 8) at
> > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
> > at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
> > at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.jav
> > a
> > 
> > :147) at
> > 
> > org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandler
> > C ollection.java:161) at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.jav
> > a
> > 
> > :147) at
> > 
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
> > at org.mortbay.jetty.Server.doStart(Server.java:210)
> > 
> > at
> > 
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > 
> > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> > at java.lang.reflect.Method.invoke(Unknown Source)
> > at org.mortbay.start.Main.invokeMain(Main.java:183)
> > at org.mortbay.start.Main.start(Main.java:497)
> > at org.mortbay.start.Main.main(Main.java:115)
> > 
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin
> > 
> > at java.net.URLClassLoader$1.run(Unknown Source)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Unknown Source)
> > at
> > 
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java
> > : 357) ... 33 more

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: RAM usage issues

2010-12-14 Thread Erick Erickson
Several observations:
1> If by RAM buffer size you're referring to the value in solrconfig.xml,
,
that is a limit on the size of the internal buffer while indexing. When
that limit is reached
the data is flushed to disk. It is irrelevant to searching.
2> When you run searches, various internal caches are populated. If you wish
to limit
 these, see solrconfig.xml. Look for the word "cache". These are
search-time caches.
3> When you reindex, if you do NOT have a  defined (schema.xml),
then
 you'll have multiple copies of the same document, which could account
for your
 index size increase.
4> even if you do have  defined, the underlying operation is that
the
 document is marked for deletion, it is NOT physically removed. In
particular,
 the terms associated with the deleted document are still kept around
until you do
 an optimize. See the admin page (stats as I recall) and see if there's
a
 difference between numDocs and maxDocs to see if this is the case.
5> What are you using to look at memory consumption? You could just be
 seeing memory that hasn't been garbage collected yet.

You should expect a limit to be reached as GC kicks in. jConsole may help
you if
you're not using that already.

Best
Erick

On Mon, Dec 13, 2010 at 11:46 PM, Cameron Hurst
wrote:

> hello all,
>
> I am a new user to Solr and I am having a few issues with the setup and
> wondering if anyone had some suggestions. I am currently running this as
> just a test environment before I go into production. I am using a
> tomcat6 environment for my servlet and solr 1.4.1 as the solr build. I
> set up the instructions following the guide here.
> http://wiki.apache.org/solr/SolrTomcatThe issue that I am having is
> that the memory usages seems high for the settings I have.
>
> When i start the server I am using about 90MB of RAM which is fine and
> from the google searches I found that is normal. The issue comes when I
> start indexing data. In my solrconf.xml file that my maximum RAM buffer
> is 32MB. In my mind that means that the maximum RAM being used by the
> servlet should be 122MB, but increasing to 150MB isn't out of my reach.
> When I start indexing data and calling searches my memory usages slowly
> keeps on increasing. The odd thing about it is that when I reindex the
> exact same data set the memory usage increases every time but no new
> data has been entered to be indexed. I stopped increasing as I went over
> 350MB of RAM.
>
> So my question in all of this is if this is normal and why the RAM
> buffer isn't being observed? Are my expectations unreasonable and
> flawed? Or could there be something else in my settings that is causing
> the memory usage to increase like this.
>
> Thanks for the help,
>
> Cameron
>


Re: Solr Tika, Text with style

2010-12-14 Thread Grant Ingersoll
To do that, you need to keep the original content and store it in a field.


On Dec 11, 2010, at 10:56 AM, ali fathieh wrote:

> Hello, 
> I've seen  this link:
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika
> What I got is pure text without any style from Tika for Solr to search in .
> Is it possible to have the text with its style from Solr? 
> In other words, we need to show text with its original style after searched 
> by solr .
> Thanks 
> Ali 
> 
> 
> 

--
Grant Ingersoll
http://www.lucidimagination.com



Re: Search with facet.pivot

2010-12-14 Thread Grant Ingersoll
The formatting of your message is a bit hard to read.  Could you please clarify 
which commands worked and which ones didn't?  Since the pivot stuff is 
relatively new, there could very well be a bug, so if you can give a simple 
test case that shows what is going on that would also be helpful, albeit not 
required.

On Dec 12, 2010, at 10:18 PM, Anders Dam wrote:

> Hi,
> 
> I have a minor problem in getting the pivoting working correctly. The thing
> is that two otherwise equal search queries behave differently, namely one is
> returning the search result with the facet.pivot fields below and another is
> returning the search result with an empty facet.pivot. This is a problem,
> since I am particularly interested in displaying the pivots.
> 
> Perhaps anyone has an idea about what is going wrong in this case, For
> clarity I paste the parameters used for searching:
> 
> 
> 
> 0
> 41
> -
> 
> 
>2<-1 5<-2 6<90%
>
> on
> 1
> 0.01
> 
>category_search
> 
> 0
> 
> 
>*:*
> 
> category
> true
> dismax
> all
> 
>*,score
> 
> true
> 1
> 
> true
> 
>shop_name:colorbob.dk
> 
> -
> 
> root_category_name,parent_category_name,category
> root_category_id,parent_category_id,category_id
> 
> 100
> -
> 
> root_category_name,parent_category_name,category
> root_category_id,parent_category_id,category_id
> 
> OKI
> 100
> 
> 
> 
> I see no pattern in what queries is returning the pivot fields and which
> ones are not
> 
> 
> The field searched in is defined as:
> 
>  required="false" termVectors="on" termPositions="on" termOffsets="on" />
> 
> And the edgytext type is defined as
> positionIncrementGap="100">
> 
>   
> stemEnglishPossessive="0" splitOnNumerics="0" preserveOriginal="1"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" />
>
>maxGramSize="25" />
> 
> 
>   
> stemEnglishPossessive="0" splitOnNumerics="0" preserveOriginal="1"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" />
>
> 
>
> 
> I am using apache-solr-4.0-2010-11-26_08-36-06 release
> 
> Thanks in advance,
> 
> Anders Dam

--
Grant Ingersoll
http://www.lucidimagination.com



Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Grant Ingersoll
For this functionality, you are probably better off using trunk or branch_3x.  
There are quite a few patches related to that particular one that you will need 
to apply in order to have it work correctly.


On Dec 13, 2010, at 10:06 PM, Adam Estrada wrote:

> All,
> 
> Can anyone shed some light on this error. I can't seem to get this
> class to load. I am using the distribution of Solr from Lucid
> Imagination and the Spatial Plugin from here
> https://issues.apache.org/jira/browse/SOLR-773. I don't know how to
> apply a patch but the jar file is in there. What else can I do?
> 
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin'
>   at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
>   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
>   at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
>   at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1442)
>   at org.apache.solr.core.SolrCore.(SolrCore.java:548)
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
>   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>   at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
>   at org.mortbay.jetty.Server.doStart(Server.java:210)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.mortbay.start.Main.invokeMain(Main.java:183)
>   at org.mortbay.start.Main.start(Main.java:497)
>   at org.mortbay.start.Main.main(Main.java:115)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
>   ... 33 more

--
Grant Ingersoll
http://www.lucidimagination.com



RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
But the entirety of the old indexes (no longer on disk) wasn't cached in 
memory, right?  Or is it?  Maybe this is me not understanding lucene enough. I 
thought that portions of the index were cached in disk, but that sometimes the 
index reader still has to go to disk to get things that aren't currently in 
caches.  If this is true (tell me if it's not!), we have an index reader that 
was based on indexes that... are no longer on disk. But the index reader is 
still open. What happens when it has to go to disk for info?

And the second replication will trigger a commit even if there are in fact no 
new files to be transfered over to slave, because there have been no changes 
since the prior sync with failed commit?

From: Upayavira [...@odoko.co.uk]
Sent: Tuesday, December 14, 2010 2:23 AM
To: solr-user@lucene.apache.org
Subject: RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap 
getting collected?

The second commit will bring in all changes, from both syncs.

Think of the sync part as a glorified rsync of files on disk. So the
files will have been copied to disk, but the in memory index on the
slave will not have noticed that those files have changed. The commit is
intended to remedy that - it causes a new index reader to be created,
based upon the new on disk files, which will include updates from both
syncs.

Upayavira

On Mon, 13 Dec 2010 23:11 -0500, "Jonathan Rochkind" 
wrote:
> Sorry, I guess I don't understand the details of replication enough.
>
> So slave tries to replicate. It pulls down the new index files. It tries
> to do a commit but fails.  But "the next commit that does succeed will
> have all the updates." Since it's a slave, it doesn't get any commits of
> it's own. But then some amount of time later, it does another replication
> pull. There are at this time maybe no _new_ changes since the last failed
> replication pull. Does this trigger a commit that will get those previous
> changes actually added to the slave?
>
> In the meantime, between commits.. are those potentially large pulled new
> index files sitting around somewhere but not replacing the old slave
> index files, doubling disk space for those files?
>
> Thanks for any clarification.
>
> Jonathan
> 
> From: ysee...@gmail.com [ysee...@gmail.com] On Behalf Of Yonik Seeley
> [yo...@lucidimagination.com]
> Sent: Monday, December 13, 2010 10:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't
> WeakHashMap getting collected?
>
> On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind 
> wrote:
> > Yonik, how will maxWarmingSearchers in this scenario effect replication?  
> > If a slave is pulling down new indexes so quickly that the warming 
> > searchers would ordinarily pile up, but maxWarmingSearchers is set to 1 
> > what happens?
>
> Like any other commits, this will limit the number of searchers
> warming in the background to 1.  If a commit is called, and that tries
> to open a new searcher while another is already warming, it will fail.
>  The next commit that does succeed will have all the updates though.
>
> Today, this maxWarmingSearchers check is done after the writer has
> closed and before a new searcher is opened... so calling commit too
> often won't affect searching, but it will currently affect indexing
> speed (since the IndexWriter is constantly being closed/flushed).
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Google like search

2010-12-14 Thread Tanguy Moal
Satya,

In fact the highlighter will select the relevant part of the whole
text and return it with the matched terms highlighted.

If you do so for a whole book, you will face the issue spotted by Dave
(too long text).

To address that issue, you have the possibility to split your book in
chapters, and index each chapter as a unique document.

You would then be interested in adding a field to identify uniquely
each book (using ISBN number for example) and turn on grouping (or
collapsing) on that field ... (see this very good blog post :
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
)

Moreover, you might be interested by the following JIRA issue :
https://issues.apache.org/jira/browse/SOLR-2272 . Using this patch,
you could for example ensure that if a given document-chapter is
selected by the query, then another (or several) document(s) (maybe a
father "book-document", or all the other chapters) get selected along
the way (by doing a self-join on the ISBN number). Here again,
grouping afterward would return a group of document representing each
book.

Good luck!

--
Tanguy

2010/12/14 Dave Searle :
> Highlighting is exactly what you need, although if you highlight the whole 
> book, this could slow down your queries. Index/store the first 5000-1 
> characters and see how you get on
>
> -Original Message-
> From: satya swaroop [mailto:satya.yada...@gmail.com]
> Sent: 14 December 2010 10:08
> To: solr-user@lucene.apache.org
> Subject: Re: Google like search
>
> Hi Tanguy,
>                  I am not asking for highlighting.. I think it can be
> explained with an example.. Here i illustarte it::
>
> when i post the query like dis::
>
> http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on
>
> i Would be getting the result as follows::
>
> -
> -
> 0
> 1
> 
> -
> -
> Java%20debugging.pdf
> 122
> -
> -
> Table of Contents
> If you're viewing this document online, you can click any of the topics
> below to link directly to that section.
> 1. Tutorial tips 2
> 2. Introducing debugging  4
> 3. Overview of the basics 6
> 4. Lessons in client-side debugging 11
> 5. Lessons in server-side debugging 15
> 6. Multithread debugging 18
> 7. Jikes overview 20
> 
> 
> 
> 
> 
>
> Here the str field contains the first 300 characters of the file as i kept a
> field to copy only 300 characters in schema.xml...
> But i dont want the content like dis.. Is there any way to make an o/p as
> follows::
>
>  Java is one of the best language,java is easy to learn...
>
>
> where this content is at start of the chapter,where the first word of java
> is occured in the file...
>
>
> Regards,
> Satya
>


RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Upayavira
A Lucene index is made up of segments. Each commit writes a segment.
Sometimes, upon commit, some segments are merged together into one, to
reduce the overall segment count, as too many segments hinders
performance. Upon optimisation, all segments are (typically) merged into
a single segment.

Replication copies any new segments from the master to the slave,
whether they be new segments arriving from a commit, or new segments
that are a result of a segment merge. The result is a set of index files
on disk that are a clean mirror of the master.

Then, when your replication process has finished syncing changed
segments, it fires a commit on the slave. This causes Solr to create a
new index reader. 

When the first query comes in, this triggers Solr to populate caches.
Whoever was unfortunate to cause that cache population will see poorer
results (we've seen 40s responses rather than 1s).

The solution to this is to set up an autowarming query in
solrconfig.xml. This query is executed against the new index reader,
causing caches to populate from the updated files on disk. Only once
that autowarming query has completed will the index reader be made
available to Solr for answering search queries.

There's some cleverness that I can't remember the details of specifying
how much to keep from the existing caches, and how much to build up from
the files on disk. If I recall, it is all configured in solrconfig.xml.

You ask a good question whether a commit will be triggered if the sync
brought over no new files (i.e. if the previous one did, but this one
didn't). I'd imagine that Solr would compare the maximum segment ID on
disk with the one in memory to make such a decision, in which case Solr
would spot the changes from the previous sync and still work. The best
way to be sure is to try it! 

The simplest way to try it (as I would do it) would be to:

1) switch off post-commit replication
2) post some content to solr
3) commit on the master
4) use rsync to copy the indexes from the master to the slave
5) do another (empty) commit on the master
6) trigger replication via an HTTP request to the slave
7) See if your posted content is available on your slave.

Maybe someone else here can tell you what is actually going on and save
you the effort!

Does that help you get some understand what is going on?

Upayavira

On Tue, 14 Dec 2010 09:15 -0500, "Jonathan Rochkind" 
wrote:
> But the entirety of the old indexes (no longer on disk) wasn't cached in
> memory, right?  Or is it?  Maybe this is me not understanding lucene
> enough. I thought that portions of the index were cached in disk, but
> that sometimes the index reader still has to go to disk to get things
> that aren't currently in caches.  If this is true (tell me if it's not!),
> we have an index reader that was based on indexes that... are no longer
> on disk. But the index reader is still open. What happens when it has to
> go to disk for info?
> 
> And the second replication will trigger a commit even if there are in
> fact no new files to be transfered over to slave, because there have been
> no changes since the prior sync with failed commit?
> 
> From: Upayavira [...@odoko.co.uk]
> Sent: Tuesday, December 14, 2010 2:23 AM
> To: solr-user@lucene.apache.org
> Subject: RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't
> WeakHashMap getting collected?
> 
> The second commit will bring in all changes, from both syncs.
> 
> Think of the sync part as a glorified rsync of files on disk. So the
> files will have been copied to disk, but the in memory index on the
> slave will not have noticed that those files have changed. The commit is
> intended to remedy that - it causes a new index reader to be created,
> based upon the new on disk files, which will include updates from both
> syncs.
> 
> Upayavira
> 
> On Mon, 13 Dec 2010 23:11 -0500, "Jonathan Rochkind" 
> wrote:
> > Sorry, I guess I don't understand the details of replication enough.
> >
> > So slave tries to replicate. It pulls down the new index files. It tries
> > to do a commit but fails.  But "the next commit that does succeed will
> > have all the updates." Since it's a slave, it doesn't get any commits of
> > it's own. But then some amount of time later, it does another replication
> > pull. There are at this time maybe no _new_ changes since the last failed
> > replication pull. Does this trigger a commit that will get those previous
> > changes actually added to the slave?
> >
> > In the meantime, between commits.. are those potentially large pulled new
> > index files sitting around somewhere but not replacing the old slave
> > index files, doubling disk space for those files?
> >
> > Thanks for any clarification.
> >
> > Jonathan
> > 
> > From: ysee...@gmail.com [ysee...@gmail.com] On Behalf Of Yonik Seeley
> > [yo...@lucidimagination.com]
> > Sent: Monday, December 13, 2010 10:41 PM
> > To: solr-user@lucene.a

Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy,
 Thanks for ur reply. sorry to ask this type of question.
how can we index each chapter of a file as seperate document.As for i know
we just give the path of file to solr to index it... Can u provide me any
sources for this type... I mean any blogs or wiki's...

Regards,
satya


Re: Google like search

2010-12-14 Thread Tanguy Moal
To do so, you have several possibilities, I don't know if there is a best one.

It depends pretty much on the format of the input file(s), your
affinities with a given programing language,some libraries you might
need and the time you're ready to spend on this task.

Consider having a look at SolrJ  (http://wiki.apache.org/solr/Solrj)
or at the DataImportHandler
(http://wiki.apache.org/solr/DataImportHandler) .

Cheers,

--
Tanguy

2010/12/14 satya swaroop :
> Hi Tanguy,
>                 Thanks for ur reply. sorry to ask this type of question.
> how can we index each chapter of a file as seperate document.As for i know
> we just give the path of file to solr to index it... Can u provide me any
> sources for this type... I mean any blogs or wiki's...
>
> Regards,
> satya
>


Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
When using the index replication over HTTP that was introduced in Solr
1.4, what is the recommended way to periodically clean up old indexes
on the slaves?

I found references to the snapcleaner script, but that seems to be for
the older ssh/rsync replication model.


thanks,
Tim


Need some guidance on solr-config settings

2010-12-14 Thread Mark
Can anyone offer some advice on what some good settings would be for an 
index or around 6 million documents totaling around 20-25gb? It seems 
like when our index gets to this size our CPU load spikes tremendously.


What would be some appropriate settings for ramBufferSize and 
mergeFactor? We currently have:


10
64

Same question on cache settings. We currently have:







false
2

Are there any other settings that I could tweak to affect performance?

Thanks


Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey

On 12/14/2010 8:31 AM, Tim Heckman wrote:

When using the index replication over HTTP that was introduced in Solr
1.4, what is the recommended way to periodically clean up old indexes
on the slaves?

I found references to the snapcleaner script, but that seems to be for
the older ssh/rsync replication model.


It's supposed to take care of removing the old indexes on its own - when 
everything is working, it builds an index. directory, 
replicates, swaps that directory in to replace index, and deletes the 
directory with the timestamp.  I have not been able to figure out what 
circumstances make this process break down and cause Solr to simply use 
the timestamp directory as-is, without deleting the old one.  For me, it 
works most of the time.  I'm running 1.4.1.


Shawn



Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey

On 12/14/2010 8:31 AM, Mark wrote:
Can anyone offer some advice on what some good settings would be for 
an index or around 6 million documents totaling around 20-25gb? It 
seems like when our index gets to this size our CPU load spikes 
tremendously.


If you are adding, deleting, or updating documents on a regular basis, I 
would bet that it's your autoWarmCount.  You've told it that whenever 
you do a commit, it needs to make up to 32768 queries against the new 
index.  That's very intense and time-consuming.  If you are also 
optimizing the index, the problem gets even worse.  On the 
documentCache, autowarm doesn't happen, so the 16384 specified there 
isn't actually doing anything.


Below are my settings.  I originally had much larger caches with equally 
large autoWarmCounts ... reducing them to this level was the only way I 
could get my autowarm time below 30 seconds on each index.  If you go to 
the admin page for your index and click on Statistics, then search for 
"warmupTime" you'll see how long it took to do the queries.  Later on 
the page you'll also see this broken down on each cache.


Since I made the changes, performance is actually better now, not 
worse.  I have been experimenting with FastLRUCache versus  LRUCache, 
because I read that below a certain hitratio, the latter is better.  
I've got 8 million documents in each shard, taking up about 15GB.


My mergeFactor is 16 and my ramBufferSize is 256MB.  These really only 
come into play when I do a full re-index, which is rare.











Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind

Yeah, I understand basically how caches work.

What I don't understand is what happens in replication if, the new 
segment files are succesfully copied, but the actual commit fails due to 
maxAutoWarmingSearches.  The new files are on disk... but the commit 
could not succeed and there is NOT a new index reader, because the 
commit failed.   And there is potentially a long gap before a future 
succesful commit.


1. Will the existing index searcher have problems because the files have 
been changed out from under it?


2. Will a future replication -- at which NO new files are available on 
master -- still trigger a future commit on slave?


Maybe these are obvious to everyone but me, because I keep asking this 
question, and the answer I keep getting is just describing the basics of 
replication, as if this obviously answers my question.


Or maybe the answer isn't obvious or clear to anyone including me, in 
which case the only way to get an answer is to try and test it myself.  
A bit complicated to test, at least for my level of knowledge, as I'm 
not sure exactly what I'd be looking for to answer either of those 
questions.


Jonathan

On 12/14/2010 9:53 AM, Upayavira wrote:

A Lucene index is made up of segments. Each commit writes a segment.
Sometimes, upon commit, some segments are merged together into one, to
reduce the overall segment count, as too many segments hinders
performance. Upon optimisation, all segments are (typically) merged into
a single segment.

Replication copies any new segments from the master to the slave,
whether they be new segments arriving from a commit, or new segments
that are a result of a segment merge. The result is a set of index files
on disk that are a clean mirror of the master.

Then, when your replication process has finished syncing changed
segments, it fires a commit on the slave. This causes Solr to create a
new index reader.

When the first query comes in, this triggers Solr to populate caches.
Whoever was unfortunate to cause that cache population will see poorer
results (we've seen 40s responses rather than 1s).

The solution to this is to set up an autowarming query in
solrconfig.xml. This query is executed against the new index reader,
causing caches to populate from the updated files on disk. Only once
that autowarming query has completed will the index reader be made
available to Solr for answering search queries.

There's some cleverness that I can't remember the details of specifying
how much to keep from the existing caches, and how much to build up from
the files on disk. If I recall, it is all configured in solrconfig.xml.

You ask a good question whether a commit will be triggered if the sync
brought over no new files (i.e. if the previous one did, but this one
didn't). I'd imagine that Solr would compare the maximum segment ID on
disk with the one in memory to make such a decision, in which case Solr
would spot the changes from the previous sync and still work. The best
way to be sure is to try it!

The simplest way to try it (as I would do it) would be to:

1) switch off post-commit replication
2) post some content to solr
3) commit on the master
4) use rsync to copy the indexes from the master to the slave
5) do another (empty) commit on the master
6) trigger replication via an HTTP request to the slave
7) See if your posted content is available on your slave.

Maybe someone else here can tell you what is actually going on and save
you the effort!

Does that help you get some understand what is going on?

Upayavira

On Tue, 14 Dec 2010 09:15 -0500, "Jonathan Rochkind"
wrote:

But the entirety of the old indexes (no longer on disk) wasn't cached in
memory, right?  Or is it?  Maybe this is me not understanding lucene
enough. I thought that portions of the index were cached in disk, but
that sometimes the index reader still has to go to disk to get things
that aren't currently in caches.  If this is true (tell me if it's not!),
we have an index reader that was based on indexes that... are no longer
on disk. But the index reader is still open. What happens when it has to
go to disk for info?

And the second replication will trigger a commit even if there are in
fact no new files to be transfered over to slave, because there have been
no changes since the prior sync with failed commit?

From: Upayavira [...@odoko.co.uk]
Sent: Tuesday, December 14, 2010 2:23 AM
To: solr-user@lucene.apache.org
Subject: RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't
WeakHashMap getting collected?

The second commit will bring in all changes, from both syncs.

Think of the sync part as a glorified rsync of files on disk. So the
files will have been copied to disk, but the in memory index on the
slave will not have noticed that those files have changed. The commit is
intended to remedy that - it causes a new index reader to be created,
based upon the new on disk files, which will include updates from both
syncs.

Upayav

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
On Tue, Dec 14, 2010 at 10:37 AM, Shawn Heisey  wrote:
> It's supposed to take care of removing the old indexes on its own - when
> everything is working, it builds an index. directory, replicates,
> swaps that directory in to replace index, and deletes the directory with the
> timestamp.  I have not been able to figure out what circumstances make this
> process break down and cause Solr to simply use the timestamp directory
> as-is, without deleting the old one.  For me, it works most of the time.
>  I'm running 1.4.1.

Interesting. I'm also running 1.4.1. Looking more closely at my index
directories and my update strategy, I see a pattern.

I run delta updates into solr every 20 minutes during the day into my
"live" replicated core. Each time this happens, of course, the index
version and generation is incremented.

Once per day in the morning, I run a full index + optimize into an "on
deck" core. When this is complete, I swap the "on deck" with the live
core. A side-effect of this is that the version number / generation of
the live index just went backwards, since the "on deck" core does not
receive the 3x-per-hour deltas during the rest of the day.

The index directories that hang around have timestamps corresponding
to the daily full update, when the version number goes backward.


Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey

On 12/14/2010 9:13 AM, Tim Heckman wrote:

Once per day in the morning, I run a full index + optimize into an "on
deck" core. When this is complete, I swap the "on deck" with the live
core. A side-effect of this is that the version number / generation of
the live index just went backwards, since the "on deck" core does not
receive the 3x-per-hour deltas during the rest of the day.

The index directories that hang around have timestamps corresponding
to the daily full update, when the version number goes backward.


Now that you mention this, I too have really only noticed this problem 
when I am fiddling with things and doing a full reindex.  This is the 
only time I swap cores on my master servers.  Since full reindexes don't 
happen very often, that explains why I don't have a problem 95% of the time.


I found SOLR-1781 and posted a comment on it.

https://issues.apache.org/jira/browse/SOLR-1781

Shawn



Re: Very high load

2010-12-14 Thread Shawn Heisey

On 12/13/2010 9:15 PM, Mark wrote:
No cache warming queries and our machines have 8g of memory in them 
with about 5120m of ram dedicated to so Solr. When our index is 
around 10-11g in size everything runs smoothly. At around 20g+ it 
just falls apart.


I just replied to your new email thread, called "Need some guidance on 
solr-config settings."  Then I saw this.


I'd recommend two things in addition to the autoWarmCount reductions I 
mentioned in the other thread.  The first is shrink your Solr memory 
footprint.  Reduce the size of your caches and your max heap size so you 
have more RAM available for the operating system disk cache.  If you 
find that you can't reduce your max heap size very much because you are 
using memory-hungry Solr features like complex faceting, we come to the 
second recommendation, which is to install more memory.  16GB would be a 
good start, but the more you can get, the better.  Referring back to my 
Solr installation, each shard has 9GB of RAM, with a max heap size of 
only 2GB.  This leaves 7GB for the disk cache, slightly less than half 
the index size.  At this time, I am not doing anything real complex with 
Solr, but all of the faceting tests that I have done work with no memory 
trouble.


Anything that you can do to decrease the index size is also a good 
idea.  Only store enough information in your index to populate the 
initial search results screen.  When your users open a particular search 
result, have your application grab all the detail information from the 
original data source, not Solr.




Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Shawn Heisey

On 12/14/2010 9:02 AM, Jonathan Rochkind wrote:
1. Will the existing index searcher have problems because the files 
have been changed out from under it?


2. Will a future replication -- at which NO new files are available on 
master -- still trigger a future commit on slave?


I'm not really sure of the answer to #2, but I believe I can answer #1.  
Lucene is designed so that all files necessary for an index to work are 
kept around after a commit until there is a new searcher to take over 
all requests with the new files.  If you are replicating only new 
segments, the old files will still be there both before and after.  If 
you just optimized the master and therefore are copying an entire new 
index, the old one will not be removed until there is a successful 
commit and therefore a new searcher.


There is another thread on replication that I just replied to as well.  
Solr actually seems a little too intent on keeping old files around - 
see SOLR-1781.


Shawn



Re: RAM usage issues

2010-12-14 Thread Shawn Heisey

On 12/13/2010 9:46 PM, Cameron Hurst wrote:

When i start the server I am using about 90MB of RAM which is fine and
from the google searches I found that is normal. The issue comes when I
start indexing data. In my solrconf.xml file that my maximum RAM buffer
is 32MB. In my mind that means that the maximum RAM being used by the
servlet should be 122MB, but increasing to 150MB isn't out of my reach.
When I start indexing data and calling searches my memory usages slowly
keeps on increasing. The odd thing about it is that when I reindex the
exact same data set the memory usage increases every time but no new
data has been entered to be indexed. I stopped increasing as I went over
350MB of RAM.


There could be large gaps in my understanding here, but one thing I have 
noticed about Java is that memory usage on a program will increase until 
it nearly fills the max heap size it has been allocated.  In order to 
increase performance, garbage collection seems to be rather lazy, until 
a large percentage of the max heap size is allocated.  I've got a 2GB 
max heap size passed to Jetty when I start Solr.  Memory usage hovers 
around 1.4GB, and it doesn't take very long for it to get there.


Solr's search functionality, especially if you give it a sort parameter, 
is memory hungry.  For each field you sort on, Solr creates a large 
filter cache entry.  The other caches are also filled quickly.  If you 
are storing a large amount of data in Solr for each document, the 
documentCache in particular will get quite large.  Every time you do a 
reindex, you are creating a new searcher with new caches.  The old one 
is eventually removed, but I'm pretty sure that until garbage collection 
runs, the memory is not actually reclaimed.


I don't know what your heap size is set to, but I'd be surprised if it's 
less than 1GB.  Java is not going to be concerned about memory usage 
when it's only using 350MB of that, so I don't think it'll even try to 
run garbage collection.


Shawn



changing data type

2010-12-14 Thread Wodek Siebor

Using DataImportHandler. In the select statement I use Oracle decode()
function.
As the result I have to change the indexed field from int to string.
However, during the load Solr throws an exception.

Any experience with that?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/changing-data-type-tp2087802p2087802.html
Sent from the Solr - User mailing list archive at Nabble.com.


changing data type

2010-12-14 Thread Wodek Siebor

Using DataImportHandler. In the select statement I use Oracle decode()
function.
As the result I have to change the indexed field from int to string.
However, during the load Solr throws an exception.

Any experience with that?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/changing-data-type-tp2087811p2087811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind

Thanks Shawn, that helps explain things.

So the issue there, with using maxSearchWarmers to try and prevent out 
of control RAM/CPU usage from over-lapping on-deck, combined with 
replication... is if you're still pulling down replications very 
frequently but using maxSearchWarmers to prevent overlapping on-deck, 
you'll save RAM/CPU but might trade that off to instead use a LOT of 
disk space for multiple versions of index segment files, until a commit 
finally goes through.


On 12/14/2010 11:38 AM, Shawn Heisey wrote:

On 12/14/2010 9:02 AM, Jonathan Rochkind wrote:

1. Will the existing index searcher have problems because the files
have been changed out from under it?

2. Will a future replication -- at which NO new files are available on
master -- still trigger a future commit on slave?

I'm not really sure of the answer to #2, but I believe I can answer #1.
Lucene is designed so that all files necessary for an index to work are
kept around after a commit until there is a new searcher to take over
all requests with the new files.  If you are replicating only new
segments, the old files will still be there both before and after.  If
you just optimized the master and therefore are copying an entire new
index, the old one will not be removed until there is a successful
commit and therefore a new searcher.

There is another thread on replication that I just replied to as well.
Solr actually seems a little too intent on keeping old files around -
see SOLR-1781.

Shawn




facet.pivot for date fields

2010-12-14 Thread Adeel Qureshi
It doesnt seems like pivot facetting works on dates .. I was just curious if
thats how its supposed to be or I am doing something wrong .. if I include a
datefield in the pivot list .. i simply dont get any facet results back for
that datefield

Thanks
Adeel


Re: changing data type

2010-12-14 Thread Erick Erickson
You haven't given us much to go on. Please post:

1> your DIH statement
2> your schema file, particularly  and  in question
3> the exception trace.
4> Anything else that comes to mind. Remember we know nothing about your
 particular setup...

Best
Erick

On Tue, Dec 14, 2010 at 3:17 PM, Wodek Siebor  wrote:

>
> Using DataImportHandler. In the select statement I use Oracle decode()
> function.
> As the result I have to change the indexed field from int to string.
> However, during the load Solr throws an exception.
>
> Any experience with that?
>
> Thanks
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/changing-data-type-tp2087811p2087811.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: changing data type

2010-12-14 Thread Wodek Siebor

The DIH statement works fine if I run it directly in SQL developer.
It's sth like: decode(, 0, 'string_1', 1, 'string_2')
The  is of type int, and in the schema.xml, since the decode
output is string then
the corresponding indexed field is of type string.

Is there a problem declaring a field in schema.xml with a different data
type than it is in oracle?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/changing-data-type-tp2087811p2088215.html
Sent from the Solr - User mailing list archive at Nabble.com.


[DIH] Example for SQL Server

2010-12-14 Thread Adam Estrada
Does anyone have an example config.xml file I can take a look at for SQL
Server? I need to index a lot of data from a DB and can't seem to figure out
the right syntax so any help would be greatly appreciated. What is the
correct /jar file to use and where do I put it in order for it to work?

Thanks,
Adam


Re: [DIH] Example for SQL Server

2010-12-14 Thread Erick Erickson
The config isn't really any different for various sql instances, about the
only difference is the driver. Have you seen the example in the
distribution somewhere like
/example/example-DIH/solr/db/conf/db-data-config.xml?

Also, there's a magic URL for debugging DIH at:
.../solr/admin/dataimport.jsp

If none of that is useful, could you post your attempt and maybe someone can
offer some hints?

Best
Erick

On Tue, Dec 14, 2010 at 5:32 PM, Adam Estrada  wrote:

> Does anyone have an example config.xml file I can take a look at for SQL
> Server? I need to index a lot of data from a DB and can't seem to figure
> out
> the right syntax so any help would be greatly appreciated. What is the
> correct /jar file to use and where do I put it in order for it to work?
>
> Thanks,
> Adam
>


Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Jan Høydahl / Cominvent
Hi,

A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5

I want to set the default to 0.8 so that if a user enters the query foo~ it 
euqals to foo~0.8

Have not seen a way to do this in Solr. A param &fuzzy.minSim=0.8 would do the 
trick. Anything like this, or shall I open a JIRA?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Robert Muir
On Tue, Dec 14, 2010 at 5:51 PM, Jan Høydahl / Cominvent
 wrote:
> Hi,
>
> A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5
>

just as an FYI, this isn't true in trunk (4.0) any more.

the defaults are changed so that it never enumerates the entire
dictionary (slow) like before, see:
https://issues.apache.org/jira/browse/LUCENE-2667

so, the default is now foo~2 (2 edit distances).


Re: Syncing 'delta-import' with 'select' query

2010-12-14 Thread Alexey Serba
What Solr version do you use?

It seems that sync flag has been added to 3.1 and 4.0 (trunk) branches
and not to 1.4
https://issues.apache.org/jira/browse/SOLR-1721

On Wed, Dec 8, 2010 at 11:21 PM, Juan Manuel Alvarez  wrote:
> Hello everyone!
> I have been doing some tests, but it seems I can't make the
> synchronize flag work.
>
> I have made two tests:
> 1) DIH with commit=false
> 2) DIH with commit=false + commit via Solr XML update protocol
>
> And here are the log results:
> For (1) the command is
> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
> and the first part of the output is:
>
> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
> QTime=0
> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport
> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
> status=0 QTime=4
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Starting delta collection.
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
>
>
> For (2) the commands are
> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
> and "/solr/update?commit=true&waitFlush=true&waitSearcher=true" and
> the first part of the output is:
>
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
> QTime=0
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport
> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
> status=0 QTime=1
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
> QTime=0
> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Dec 8, 2010 4:22:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start 
> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>
> In (2) it seems like the commit is being fired before the delta-update 
> finishes.
>
> Am I using the "synchronous" flag right?
>
> Thanks in advance!
> Juan M.
>
> On Mon, Dec 6, 2010 at 6:46 PM, Juan Manuel Alvarez  
> wrote:
>> Thanks for all the help! It is really appreciated.
>>
>> For now, I can afford the parallel requests problem, but when I put
>> synchronous=true in the delta import, the call still returns with
>> outdated items.
>> Examining the log, it seems that the commit operation is being
>> executed after the operation returns, even when I am using
>> commit=true.
>> Is it possible to also execute the commit synchronously?
>>
>> Cheers!
>> Juan M.
>>
>> On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba  wrote:
 When you say "two parallel requests from two users to single DIH
 request handler", what do you mean by "request handler"?
>>> I mean DIH.
>>>
 Are you
 refering to the HTTP request? Would that mean that if I make the
 request from different HTTP sessions it would work?
>>> No.
>>>
>>> It means that when you have two users that simultaneously changed two
>>> objects in the UI then you have two HTTP requests to DIH to pull
>>> changes from the db into Solr index. If the second request comes when
>>> the first is not fully processed then the second request will be
>>> rejected. As a result your index would be outdated (w/o the latest
>>> update) until the next update.
>>>
>>
>


Re: my index has 500 million docs ,how to improve so lr search performance?

2010-12-14 Thread Alexey Serba
How much memory do you allocate for JVMs? Considering you have 10 JVMs
per server (10*N) you might have not enough memory for OS file system
cache ( you need to keep some memory free for that )

> all indexs size is about 100G
is this per server or whole size?


On Mon, Nov 15, 2010 at 8:35 AM, lu.rongbin  wrote:
>
> In addition,my index has only two store fields, id and price, and other
> fields are index. I increase the document and query cache. the ec2
> m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p1902869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Need some guidance on solr-config settings

2010-12-14 Thread Mark

Excellent reply.

You mentioned: "I've been experimenting with FastLRUCache versus 
LRUCache, because I read that below a certain hitratio, the latter is 
better."


Do you happen to remember what that threshold is? Thanks

On 12/14/10 7:59 AM, Shawn Heisey wrote:

On 12/14/2010 8:31 AM, Mark wrote:
Can anyone offer some advice on what some good settings would be for 
an index or around 6 million documents totaling around 20-25gb? It 
seems like when our index gets to this size our CPU load spikes 
tremendously.


If you are adding, deleting, or updating documents on a regular basis, 
I would bet that it's your autoWarmCount.  You've told it that 
whenever you do a commit, it needs to make up to 32768 queries against 
the new index.  That's very intense and time-consuming.  If you are 
also optimizing the index, the problem gets even worse.  On the 
documentCache, autowarm doesn't happen, so the 16384 specified there 
isn't actually doing anything.


Below are my settings.  I originally had much larger caches with 
equally large autoWarmCounts ... reducing them to this level was the 
only way I could get my autowarm time below 30 seconds on each index.  
If you go to the admin page for your index and click on Statistics, 
then search for "warmupTime" you'll see how long it took to do the 
queries.  Later on the page you'll also see this broken down on each 
cache.


Since I made the changes, performance is actually better now, not 
worse.  I have been experimenting with FastLRUCache versus  LRUCache, 
because I read that below a certain hitratio, the latter is better.  
I've got 8 million documents in each shard, taking up about 15GB.


My mergeFactor is 16 and my ramBufferSize is 256MB.  These really only 
come into play when I do a full re-index, which is rare.











Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey

On 12/14/2010 5:05 PM, Mark wrote:

Excellent reply.

You mentioned: "I've been experimenting with FastLRUCache versus 
LRUCache, because I read that below a certain hitratio, the latter is 
better."


Do you happen to remember what that threshold is? Thanks


Looks like it's 75%, and that it's documented in the example solrconfig.xml.






Re: Newbie: Indexing unrelated MySQL tables

2010-12-14 Thread Alexey Serba
> I figured I would create three entities and relevant
> schema.xml entries in this way:
>
> dataimport.xml:
> 
> 
> 
That's correct. You can list several entities under document element.
You can index them separately using entity parameter (i.e. add
entity=Users to you full import HTTP request). Do not forget to add
clean=false so you won't delete previously indexed documents. Or you
can index all entities in one request (by default).

> schema.xml:
> 
> 
> 
> 
> 
> 
> 
> 
> 
Why do you use string type for textual fields (description, company,
name, firstname, lastname, etc)? Is it intentional to use these fields
in filtering/faceting?

You can also add "default" searchable multivalued field (type=text)
and copy field instructions to copy all textual content into this
field ( http://wiki.apache.org/solr/SchemaXml#Copy_Fields ). Thus you
will be able to search in "default" field for terms from all fields
(firstname, lastname, name, description, company, position, location,
etc).

You would probably want to add field type=user/artwork/job. You will
be able to facet/filter on that fields and provide better user search
experience.

> This obviously does not work as I want. I only get results from the "users"
> table, and I cannot get results from neither "artwork" nor "jobs".
Are you sure that this is because the indexing isn't working? How do
you search for your data? What query parser (standard/dismax)/etc?

> I have
> found out that the possible solution is in putting  tags in the
>  tag and somehow aliasing column names for Solr, but the logic
> behind this is completely alien to me and the blind tests I tried did not
> yield anything.
You don't need to list your fields explicitly in fields declaration.

BTW, what database do you use? Oracle has some issue with upper casing
column names that could be a problem.

> My logic says that the "id" field is getting replaced by the
> "id" field of other entities and indexes are being overwritten.
Are your ids unique across different objects? I.e. is there any job
with the same id as user? If so then you would probably want to prefix
your ids like:





> But if I
> aliased all "id" fields in all entities into something else, such as
> "user_id" and "job_id", I couldn't figure what to put in the 
> configuration in schema.xml because I have three different id fields from
> three different tables that are all primary keyed in the database!
You can still create separate id fields if you need to search for
different objects by id and don't mess with prefixed ids. But it's not
required.

HTH,
Alexey


limit the search results to one category

2010-12-14 Thread sara motahari
Hi all,

I am using a dismax request handler with vrious fieds that it searches, but I 
also want to enable the users to select a category from a drop-down list 
and only get the results that belong to that category. It seems I can't use a 
nested query with dismax as the first one and standard as the nested one? Is 
there another way to do this?


  

Re: limit the search results to one category

2010-12-14 Thread Ahmet Arslan
> I am using a dismax request handler with vrious fieds that
> it searches, but I 
> also want to enable the users to select a category from a
> drop-down list 
> and only get the results that belong to that category. It
> seems I can't use a 
> nested query with dismax as the first one and standard as
> the nested one? Is 
> there another way to do this?

Filter Queries? fq=category:foo

http://wiki.apache.org/solr/CommonQueryParameters#fq    





Re: limit the search results to one category

2010-12-14 Thread sara motahari
I guess so. I didn't know I could use it with dismax I'll try. thanks Ahmet.





From: Ahmet Arslan 
To: solr-user@lucene.apache.org
Sent: Tue, December 14, 2010 5:42:51 PM
Subject: Re: limit the search results to one category

> I am using a dismax request handler with vrious fieds that
> it searches, but I 
> also want to enable the users to select a category from a
> drop-down list 
> and only get the results that belong to that category. It
> seems I can't use a 
> nested query with dismax as the first one and standard as
> the nested one? Is 
> there another way to do this?

Filter Queries? fq=category:foo

http://wiki.apache.org/solr/CommonQueryParameters#fq    


  

Re: Search with facet.pivot

2010-12-14 Thread Anders Dam
I forgot to mention that the query is handlede by the Dismax Request Handler

Grant, from the  tag and down you see all the query
parameters used. The only thing varying from query to query is the actual
query (q), When searching on by example '1000' (q=1000) facet.pivot fields
are correctly returned, while when searching on by example 'OKI' the
facet.pivot fields are not returned.

If it is of any help, what I am searching are products compatible with
certain printers, the printer models are stored in a relational database,
where each printer cartridge belongs to more categories, the categorized are
in a often 2-3 level deep hierarchy which is flattened out at data import
time, so that there is a column at import data (DataImportHandler) called
category_name,parent_category_name and root_category_name, these fields are
copied to the field category_search field also mentioned in my first mail.

Here are the response header listing all the params used for querying on
pastebin.com with a bit better formatting, hope that helps:
http://pastebin.com/9FgijpJ6

If there is any information I can provide to
help us solve this problem I will be happy to provide it.

Thanks in advance,

Anders


On Tue, Dec 14, 2010 at 7:47 PM, Grant Ingersoll wrote:

> The formatting of your message is a bit hard to read.  Could you please
> clarify which commands worked and which ones didn't?  Since the pivot stuff
> is relatively new, there could very well be a bug, so if you can give a
> simple test case that shows what is going on that would also be helpful,
> albeit not required.
>
> On Dec 12, 2010, at 10:18 PM, Anders Dam wrote:
>
> > Hi,
> >
> > I have a minor problem in getting the pivoting working correctly. The
> thing
> > is that two otherwise equal search queries behave differently, namely one
> is
> > returning the search result with the facet.pivot fields below and another
> is
> > returning the search result with an empty facet.pivot. This is a problem,
> > since I am particularly interested in displaying the pivots.
> >
> > Perhaps anyone has an idea about what is going wrong in this case, For
> > clarity I paste the parameters used for searching:
> >
> >
> > 
> > 0
> > 41
> > -
> > 
> > 
> >2<-1 5<-2 6<90%
> >
> > on
> > 1
> > 0.01
> > 
> >category_search
> > 
> > 0
> > 
> > 
> >*:*
> > 
> > category
> > true
> > dismax
> > all
> > 
> >*,score
> > 
> > true
> > 1
> > 
> > true
> > 
> >shop_name:colorbob.dk
> > 
> > -
> > 
> > root_category_name,parent_category_name,category
> > root_category_id,parent_category_id,category_id
> > 
> > 100
> > -
> > 
> > root_category_name,parent_category_name,category
> > root_category_id,parent_category_id,category_id
> > 
> > OKI
> > 100
> > 
> > 
> >
> > I see no pattern in what queries is returning the pivot fields and which
> > ones are not
> >
> >
> > The field searched in is defined as:
> >
> >  stored="false"
> > required="false" termVectors="on" termPositions="on" termOffsets="on" />
> >
> > And the edgytext type is defined as
> > > positionIncrementGap="100">
> > 
> >   
> > > stemEnglishPossessive="0" splitOnNumerics="0" preserveOriginal="1"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" />
> >
> >> maxGramSize="25" />
> > 
> > 
> >   
> > > stemEnglishPossessive="0" splitOnNumerics="0" preserveOriginal="1"
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" />
> >
> > 
> >
> >
> > I am using apache-solr-4.0-2010-11-26_08-36-06 release
> >
> > Thanks in advance,
> >
> > Anders Dam
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
>


Re: Google like search

2010-12-14 Thread Bhavnik Gajjar
Hi Satya,

Coming to your original question, there is one possibility to make Solr 
emit snippets like Google. Solr query syntax goes like,

http://localhost:8080/solr/DefaultInstance/select/?q=java&version=2.2&start=0&rows=10&indent=on&hl=true&hl.snippets=5&hl.fl=Field_Text&fl=Field_Text

Note that, the key thing used here is Highlighting feature provided by 
Solr. Executing above Solr query will result into two main block of 
results. First part would contain normal results, whereas another part 
would contain highlighted snippets, based on the parameters provided in 
query. One should pickup the later part (snippets) and show it in result 
page UI.

Cheers,

Bhavnik Gajjar


On 12/14/2010 8:35 PM, Tanguy Moal wrote:
> To do so, you have several possibilities, I don't know if there is a best one.
>
> It depends pretty much on the format of the input file(s), your
> affinities with a given programing language,some libraries you might
> need and the time you're ready to spend on this task.
>
> Consider having a look at SolrJ  (http://wiki.apache.org/solr/Solrj)
> or at the DataImportHandler
> (http://wiki.apache.org/solr/DataImportHandler) .
>
> Cheers,
>
> --
> Tanguy
>
> 2010/12/14 satya swaroop:
>> Hi Tanguy,
>>  Thanks for ur reply. sorry to ask this type of question.
>> how can we index each chapter of a file as seperate document.As for i know
>> we just give the path of file to solr to index it... Can u provide me any
>> sources for this type... I mean any blogs or wiki's...
>>
>> Regards,
>> satya


The contents of this eMail including the contents of attachment(s) are 
privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and 
should not be disclosed to, used by or copied in any manner by anyone other 
than the intended addressee(s). If this eMail has been received by error, 
please advise the sender immediately and delete it from your system. The views 
expressed in this eMail message are those of the individual sender, except 
where the sender expressly, and with authority, states them to be the views of 
GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this eMail or any action taken in reliance on this eMail 
is strictly prohibited and may be unlawful. This eMail may contain viruses. 
GNPL has taken every reasonable precaution to minimize this risk, but is not 
liable for any damage you may sustain as a result of any virus in this eMail. 
You should carry out your own virus checks before opening the eMail or 
attachment(s). GNPL is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt. GNPL reserves the right to monitor and review the content of all 
messages sent to or from this eMail address and may be stored on the GNPL eMail 
system. In case this eMail has reached you in error, and you  would no longer 
like to receive eMails from us, then please send an eMail to 
d...@gatewaynintec.com


Re: facet.pivot for date fields

2010-12-14 Thread pankaj bhatt
Hi Adeel,
  You can make use of facet.query attribute to make the Faceting work
across a range of dates. Here i am using the duration, just replace the
field with a field date and Range values as the DATE in SOLR Format.
so your query parameter will be like this ( you can pass multiple parameter
of "facet.query" name)

http//blasdsdfsd/q?=asdfasd&facet.query=itemduration:[0 To
49]&facet.query=itemduration:[50 To 99]&facet.query=itemduration:[100 To
149]

Hope, it helps.

/ Pankaj Bhatt.

On Wed, Dec 15, 2010 at 2:01 AM, Adeel Qureshi wrote:

> It doesnt seems like pivot facetting works on dates .. I was just curious
> if
> thats how its supposed to be or I am doing something wrong .. if I include
> a
> datefield in the pivot list .. i simply dont get any facet results back for
> that datefield
>
> Thanks
> Adeel
>


RE: Userdefined Field type - Faceting

2010-12-14 Thread Viswa S

This worked, thanks Yonik.

-Viswa

> Date: Mon, 13 Dec 2010 22:54:35 -0500
> Subject: Re: Userdefined Field type - Faceting
> From: yo...@lucidimagination.com
> To: solr-user@lucene.apache.org
> 
> Perhaps try overriding indexedToReadable() also?
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Mon, Dec 13, 2010 at 10:00 PM, Viswa S  wrote:
> >
> > Hello,
> >
> > We implemented an IP-Addr field type which internally stored the ips as 
> > hex-ed string (e.g. "192.2.103.29" will be stored as "c002671d"). My 
> > "toExternal" and "toInternal" methods for appropriate conversion seems to 
> > be working well for query results, but however when faceting on this field 
> > it returns the raw strings. in other words the query response would have 
> > "192.2.103.29", but facet on the field would return " > name="c002671d">1"
> >
> > Why are these methods not used by the faceting component to convert the 
> > resulting values?
> >
> > Thanks
> > Viswa
> >