solr indexing

2011-02-22 Thread satya swaroop
Hi all,
   to my keen intrest on solr indexing mechanism i started mining the
code of solr indexing (/update/extract), i read the indexing file formats,
scoring procedure, i have some queries regarding this..
1) the scoring is performed on the dynamic and precalculated value(doc
boost, field boost, lengthnorm). In calculating the score if suppose a term
in the index consits nearly one million docs then is solr calculating the
score for each and every doc present for the term and getting the top docs
from the index??? or is it undergoing any mechanism such that limiting the
calculation of score to only a particular docs???

If anybody know about it or any documentation regarding this please inform
me...


Regards,
satya


Solr indexing

2007-07-03 Thread niraj tulachan
Hi all,
 I have successfully implemented the Solr so far but there are couple of 
questions I want the solr user to shine a light on them:
  1) In Solr, we create an index by POSTing a XML file to the server.  However, 
is there a way we can do that same process by db(containg metadat) approach?
  2) while updating the pre-exist index, the update won't happen until we do 
the "commit" on it.  However, While updating the index (before doing 'commit'), 
can we still search on that index (to use the old content)?
  Any info will be highly appericated..
  Cheers,
  Niraj

   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel. 

Solr Indexing Performance

2011-01-29 Thread Darx Oman
Hi guys



I'm running a solr instance (trunk)  in my dev. Server to test my
configuration.  I'm doing a DIH full import to index 49 PDF files with their
corresponding database records.  Both the PDF files and database are local
in the server.

*Server : *

· Windows 2008 R2

· MS SQL server 2008 R2

· 16 core processor

· 16 GB ram

*Tomcat (7.0.5) : *

· Set JAVA_OPTS = %JAVA_OPTS%  -Xms1024M  -Xmx8192M

*Solrconfig:*

· Main index configurations
2048
50

*DIH configuration:*

· 2 data sources defined  jdbcDataSource and BinFileDataSource

· One main entity with 3 sub entities



 

 

 



· Total schema fields are 8, three of which are text type and
multivalued.

*My DIH import Status Messages:*

· Total Requests made to DataSource = 99**

· Total Rows Fetched = 2124**

· Total DocumentsProcessed = 49**

· Time Taken = *0:2:3:880***

*
Is this time reasonable or it can be improved?*


Solr Indexing Patterns

2011-06-03 Thread Judioo
What is the "best practice" method to index the following in Solr:

I'm attempting to use solr for a book store site.

Each book will have a price but on occasions this will be discounted. The
discounted price exists for a defined time period but there may be many
discount periods. Each discount will have a brief synopsis, start and end
time.

A subset of the desired output would be as follows:

...
"response":{"numFound":1,"start":0,"docs":[
  {
"name":"The Book",
"price":"$9.99",
"discounts":[
{
 "price":"$3.00",
 "synopsis":"thanksgiving special",
 "starts":"11-24-2011",
 "ends":"11-25-2011",
},
{
 "price":"$4.00",
 "synopsis":"Canadian thanksgiving special",
 "starts":"10-10-2011",
 "ends":"10-11-2011",
},
 ]
  },
  .

A requirement is to be able to search for just discounted publications. I
think I could use date faceting for this ( return publications that are
within a discount window ). When a discount search is performed no
publications that are not currently discounted will be returned.

My question are:

   - Does solr support this type of sub documents

In the above example the discounts are the sub documents. I know solr is not
a relational DB but I would like to store and index the above representation
in a single document if possible.

   - what is the best method to approach the above

I can see in many examples the authors tend to denormalize to solve similar
problems. This suggest that for each discount I am required to duplicate the
book data or form a document
association.
Which method would you advise?

It would be nice if solr could return a response structured as above.

Much Thanks


Solr indexing questions

2011-06-11 Thread Frank A
I currently have my site setup using SOLR for some pretty simple queries and
am looking to add some additional features and was hoping to get some
guidance.

Heres my situation, for a given restaurant I have the following info:

rest name,
editorial,
list of features (e.g. Reservations, Good for Groups, etc)
list of cuisines (American, Italian, etc)
List of user reviews
Additional meta data

There are 2 different things I want to do:

Build a directory based on "keywords or phrases" - e.g. looking through all
the data find the common keywords/phrases - e.g. "hot dog" or "Brazilian
steakhouse". I'm not sure how to extract these keyphrases from the data
without having to input them myself.  Is this a good fit for SOLR?  If so
what features should I look into?

Second, is an "advanced" search that basically matches user input on ANY of
the fields.  However I'd like it to have some basic handling for mispelled
words, synonyms (bbq and bar-b-q) and weight the user of the terms
differently (e.g. name of restaurant vs. in a users comments).  I'm sure
this is SOLRs sweet spot but I'm having trouble figuring out how to put it
all together.

Thanks in advance.


offline solr indexing

2009-04-27 Thread Charles Federspiel
Solr Users,
Our app servers are setup on read-only filesystems.  Is there a way
to perform indexing from the command line, then copy the index files to the
app-server and use Solr to perform search from inside the servlet container?

If the Solr implementation is bound to http requests, can Solr perform
searches against an index that I create with Lucene?
thank you,
Charles Federspiel


Dynamic Solr indexing

2010-03-01 Thread Peter S

Hi,

 

I wonder if anyone could shed some insight on a dynamic indexing question...?

 

The basic requirement is this:

 

Indexing:

A process writes to an index, and when it reaches a certain size (say, 1GB), a 
new index (core) is 'automatically' created/deployed (i.e. the process doesn't 
know about it) and further indexing now goes into the new core. When that one 
reaches its threshold size, a new index is deplyoed, and so on.

The process that is writing to the indices doesn't actually know that it is 
writing to different cores.

 

Searching:

When a search is directed at the above index, the actual search is a 
distrbitued shard search across all the shards that have been deployed. Again, 
the searcher process doesn't know this, but gets back the aggregated results, 
as if it had specified all the shards in the request URL, but as these are 
changing dynamically, it of course can't know what they all are at any given 
time.

 

This requirement sounds to me perhaps like a Katta thing. I've had a look at 
Solr-1395, and there's questions in Lucid that sound similar (e.g. 
http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
 so I guess (hope) I'm not the only one with this requirement.

 

I couldn't find anything in either Katta or SOLR-1395 that fit both the writing 
and searching requirement, but I could easily have missed it.

 

Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be 
'production-ready'? Has anyone deployed this type of thing in a production 
environment?

 

Any insight/advice would be greatly appreciated.

 

Thanks!

Peter

 

 
  
_
Do you have a story that started on Hotmail? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

SOLR Indexing/Querying

2007-05-30 Thread realw5

Hey Guys,
I need some guidance in regards to a problem we are having with our solr
index. Below is a list of terms our customers search for, which are failing
or not returning the complete set. The second side of the list is the
product id/keyword we want it to match.

Can you give me some direction on how this can (or let me know if i can't be
done) with index/query analyzers. Any help is much appeciated!

Dan

---

Keyword Typed In / We want it to find

D3555 / 3555LHP
D460160-BN / D460160
D460160BN / D460160
Dd454557 / D454557
84200ORB / 84200
84200-ORB / 84200
T13420-SCH / T13420
t14240-ss / t14240
-- 
View this message in context: 
http://www.nabble.com/SOLR-Indexing-Querying-tf3843221.html#a10883456
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr indexing

2007-07-03 Thread Xuesong Luo
2) yes. 

-Original Message-
From: niraj tulachan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 3:09 PM
To: solr-user@lucene.apache.org
Subject: Solr indexing

Hi all,
 I have successfully implemented the Solr so far but there are
couple of questions I want the solr user to shine a light on them:
  1) In Solr, we create an index by POSTing a XML file to the server.
However, is there a way we can do that same process by db(containg
metadat) approach?
  2) while updating the pre-exist index, the update won't happen until
we do the "commit" on it.  However, While updating the index (before
doing 'commit'), can we still search on that index (to use the old
content)?
  Any info will be highly appericated..
  Cheers,
  Niraj

   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel. 



Re: Solr indexing

2007-07-03 Thread Mike Klaas

On 3-Jul-07, at 3:08 PM, niraj tulachan wrote:


Hi all,
 I have successfully implemented the Solr so far but there are  
couple of questions I want the solr user to shine a light on them:
  1) In Solr, we create an index by POSTing a XML file to the  
server.  However, is there a way we can do that same process by db 
(containg metadat) approach?


Yes, but I'm not familiar with the techniques.

  2) while updating the pre-exist index, the update won't happen  
until we do the "commit" on it.  However, While updating the index  
(before doing 'commit'), can we still search on that index (to use  
the old content)?


Aboslutely.  This is the central tenet of Solr.

-Mike


solr indexing exception

2011-08-26 Thread abhijit bashetti
Hi,

I am using DIH for indexing 50K documents .

I am using 64-bit machine with 4GB RAM

I got the following exception:

org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuffer.append(Unknown Source)

at java.io.StringWriter.write(Unknown Source)

at
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:115)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:261)

at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:132)

at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)

at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)

at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)

at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)

... 6 more



26-Aug-2011 08:18:35 org.apache.solr.common.SolrException log

SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuffer.append(Unknown Source)

at java.io.StringWriter.write(Unknown Source)

at
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:115)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at
org.apache.tika.sax.ContentHandlerDecorator.charac

Solr Indexing Time

2011-11-10 Thread Husain, Yavar
Solr 1.4 is doing great with respect to Indexing on a dedicated physical server 
(Windows Server 2008). For Indexing around 1 million full text documents 
(around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB 
RAM.



However while using Solr on a VM, with 4 GB RAM it took 50 minutes to index at 
the first time. Note that there is no Network delays and no RAM issues. Now 
when I increased the RAM to 8GB and increased the heap size, the indexing time 
increased to 2 hrs. That was really strange. Note that except for SQL Server 
there is no other process running. There are no network delays. However I have 
not checked for File I/O. Can that be a bottleneck? Does Solr has any issues 
running in "Virtualization" Environment?



I read a paper today by Brian & Harry: "ON THE RESPONSE TIME OF A SOLR SEARCH 
ENGINE IN A VIRTUALIZED ENVIRONMENT" & they claim that performance gets 
deteriorated when RAM is increased when Solr is running on a VM but that is 
with respect to query times and not indexing times.



I am bit confused as to why it took longer on a VM when I repeated the same 
test second time with increased heap size and RAM.



**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



Re: Solr Indexing Performance

2011-01-31 Thread Tomás Fernández Löbbe
Well, I would say that the best way to be sure is to benchmark different
configurations.
As far as I know, it's usually not recommended such a big RAM Buffer size,
default is 32 MB and probably won't get any improvements using more than 128
MB.
The same with the mergeFactor, I know that a larger merge factor it's better
for indexing, but 50 sounds like a lot. Anyway, as I said before, the best
thing to do is benchmark different configurations and see which one works
better for you.

Have you tried assigning less memory to the JVM? That would leave more
memory available to the OS.

Tomás

On Sun, Jan 30, 2011 at 1:54 AM, Darx Oman  wrote:

> Hi guys
>
>
>
> I'm running a solr instance (trunk)  in my dev. Server to test my
> configuration.  I'm doing a DIH full import to index 49 PDF files with
> their
> corresponding database records.  Both the PDF files and database are local
> in the server.
>
> *Server : *
>
> · Windows 2008 R2
>
> · MS SQL server 2008 R2
>
> · 16 core processor
>
> · 16 GB ram
>
> *Tomcat (7.0.5) : *
>
> · Set JAVA_OPTS = %JAVA_OPTS%  -Xms1024M  -Xmx8192M
>
> *Solrconfig:*
>
> · Main index configurations
>2048
>50
>
> *DIH configuration:*
>
> · 2 data sources defined  jdbcDataSource and BinFileDataSource
>
> · One main entity with 3 sub entities
>
> 
>
> 
>
> 
>
> 
>
> 
>
> · Total schema fields are 8, three of which are text type and
> multivalued.
>
> *My DIH import Status Messages:*
>
> · Total Requests made to DataSource = 99**
>
> · Total Rows Fetched = 2124**
>
> · Total DocumentsProcessed = 49**
>
> · Time Taken = *0:2:3:880***
>
> *
> Is this time reasonable or it can be improved?*
>


Re: Solr Indexing Performance

2011-02-01 Thread Darx Oman
Thanx  Tomas
I'll try with different configuration


Re: Solr Indexing Performance

2011-02-04 Thread Otis Gospodnetic
Hi,

2 GB for ramBufferSize is probably too much and not needed, but you could 
increase it from default 32 MB to something like 128 MB or even 512 MB, if you 
really have that much data where that would make a difference (you mention only 
49 PDF files).  I'd leave mergeFactor at 10 for now.  The slowness (if there is 
slowness - how long is it taking?) could be from:
* slow DB
* suboptimal SQL
* PDF content extraction
* indexing itself
* ...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Tomás Fernández Löbbe 
> To: solr-user@lucene.apache.org
> Sent: Mon, January 31, 2011 10:13:32 AM
> Subject: Re: Solr Indexing Performance
> 
> Well, I would say that the best way to be sure is to benchmark  different
> configurations.
> As far as I know, it's usually not recommended  such a big RAM Buffer size,
> default is 32 MB and probably won't get any  improvements using more than 128
> MB.
> The same with the mergeFactor, I know  that a larger merge factor it's better
> for indexing, but 50 sounds like a  lot. Anyway, as I said before, the best
> thing to do is benchmark different  configurations and see which one works
> better for you.
> 
> Have you tried  assigning less memory to the JVM? That would leave more
> memory available to  the OS.
> 
> Tomás
> 
> On Sun, Jan 30, 2011 at 1:54 AM, Darx Oman  wrote:
> 
> >  Hi guys
> >
> >
> >
> > I'm running a solr instance  (trunk)  in my dev. Server to test my
> > configuration.  I'm  doing a DIH full import to index 49 PDF files with
> > their
> >  corresponding database records.  Both the PDF files and database are  local
> > in the server.
> >
> > *Server : *
> >
> > ·  Windows 2008 R2
> >
> > ·  MS SQL server 2008 R2
> >
> > · 16  core processor
> >
> > · 16 GB  ram
> >
> > *Tomcat (7.0.5) : *
> >
> > ·  Set JAVA_OPTS = %JAVA_OPTS%  -Xms1024M   -Xmx8192M
> >
> > *Solrconfig:*
> >
> > ·  Main index configurations
> > 2048
> > 50
> >
> > *DIH  configuration:*
> >
> > · 2 data sources  defined  jdbcDataSource and BinFileDataSource
> >
> > ·  One main entity with 3 sub entities
> >
> >  
> >
> > 
> >
> >  
> >
> >  
> >
> >  
> >
> > · Total schema  fields are 8, three of which are text type and
> >  multivalued.
> >
> > *My DIH import Status Messages:*
> >
> >  · Total Requests made to DataSource =  99**
> >
> > · Total Rows Fetched =  2124**
> >
> > · Total DocumentsProcessed =  49**
> >
> > · Time Taken =  *0:2:3:880***
> >
> > *
> > Is this time reasonable or it can be  improved?*
> >
>


Re: Solr Indexing Performance

2011-02-05 Thread Darx Oman
I indexed 1000 pdf file with the same configuration, it completed in about
32 min.


Re: Solr Indexing Performance

2011-02-07 Thread Gora Mohanty
On Sat, Feb 5, 2011 at 2:06 PM, Darx Oman  wrote:
> I indexed 1000 pdf file with the same configuration, it completed in about
> 32 min.

So, it seems like your indexing scales at least as well as the number
of the PDF documents that you have.

While this might be good news in your case, it is difficult to estimate
an "expected" indexing rate when indexing from documents.

Regards,
Gora


Issue in Solr Indexing

2011-05-26 Thread deepak agrawal
Hi All,

When i am Indexing the Record into the Solr it is successfully indexing and
after that i am committing that commit is also showing successfully.
but when i am going to search that particular record into the solr that time
i am not getting that record from Solr.
I am using Solr1.4.1 version.

any one can please suggest why that particular record is not indexing.I am
not getting any error from Catalina log file also.

Thanks in advance.


-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.


Re: Solr Indexing Patterns

2011-06-03 Thread Erick Erickson
How often are the discounts changed? Because you can simply
re-index the book information with a multiValued "discounts" field
and get something similar to your example (&wt=json)


Best
Erick

On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> What is the "best practice" method to index the following in Solr:
>
> I'm attempting to use solr for a book store site.
>
> Each book will have a price but on occasions this will be discounted. The
> discounted price exists for a defined time period but there may be many
> discount periods. Each discount will have a brief synopsis, start and end
> time.
>
> A subset of the desired output would be as follows:
>
> ...
> "response":{"numFound":1,"start":0,"docs":[
>  {
>    "name":"The Book",
>    "price":"$9.99",
>    "discounts":[
>        {
>         "price":"$3.00",
>         "synopsis":"thanksgiving special",
>         "starts":"11-24-2011",
>         "ends":"11-25-2011",
>        },
>        {
>         "price":"$4.00",
>         "synopsis":"Canadian thanksgiving special",
>         "starts":"10-10-2011",
>         "ends":"10-11-2011",
>        },
>     ]
>  },
>  .
>
> A requirement is to be able to search for just discounted publications. I
> think I could use date faceting for this ( return publications that are
> within a discount window ). When a discount search is performed no
> publications that are not currently discounted will be returned.
>
> My question are:
>
>   - Does solr support this type of sub documents
>
> In the above example the discounts are the sub documents. I know solr is not
> a relational DB but I would like to store and index the above representation
> in a single document if possible.
>
>   - what is the best method to approach the above
>
> I can see in many examples the authors tend to denormalize to solve similar
> problems. This suggest that for each discount I am required to duplicate the
> book data or form a document
> association.
> Which method would you advise?
>
> It would be nice if solr could return a response structured as above.
>
> Much Thanks
>


Re: Solr Indexing Patterns

2011-06-03 Thread Judioo
Hi,
Discounts can change daily. Also there can be a lot of them (over time and
in a given time period ).

Could you give an example of what you mean buy multi-valuing the field.

Thanks

On 3 June 2011 14:29, Erick Erickson  wrote:

> How often are the discounts changed? Because you can simply
> re-index the book information with a multiValued "discounts" field
> and get something similar to your example (&wt=json)
>
>
> Best
> Erick
>
> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> > What is the "best practice" method to index the following in Solr:
> >
> > I'm attempting to use solr for a book store site.
> >
> > Each book will have a price but on occasions this will be discounted. The
> > discounted price exists for a defined time period but there may be many
> > discount periods. Each discount will have a brief synopsis, start and end
> > time.
> >
> > A subset of the desired output would be as follows:
> >
> > ...
> > "response":{"numFound":1,"start":0,"docs":[
> >  {
> >"name":"The Book",
> >"price":"$9.99",
> >"discounts":[
> >{
> > "price":"$3.00",
> > "synopsis":"thanksgiving special",
> > "starts":"11-24-2011",
> > "ends":"11-25-2011",
> >},
> >{
> > "price":"$4.00",
> > "synopsis":"Canadian thanksgiving special",
> > "starts":"10-10-2011",
> > "ends":"10-11-2011",
> >},
> > ]
> >  },
> >  .
> >
> > A requirement is to be able to search for just discounted publications. I
> > think I could use date faceting for this ( return publications that are
> > within a discount window ). When a discount search is performed no
> > publications that are not currently discounted will be returned.
> >
> > My question are:
> >
> >   - Does solr support this type of sub documents
> >
> > In the above example the discounts are the sub documents. I know solr is
> not
> > a relational DB but I would like to store and index the above
> representation
> > in a single document if possible.
> >
> >   - what is the best method to approach the above
> >
> > I can see in many examples the authors tend to denormalize to solve
> similar
> > problems. This suggest that for each discount I am required to duplicate
> the
> > book data or form a document
> > association >.
> > Which method would you advise?
> >
> > It would be nice if solr could return a response structured as above.
> >
> > Much Thanks
> >
>


Re: Solr Indexing Patterns

2011-06-05 Thread Erick Erickson
See: http://wiki.apache.org/solr/SchemaXml

By adding ' "multiValued="true" ' to the field, you can add
the same field multiple times in a doc, something like



  value1
  value2



But there's no real ability  in Solr to store "sub documents",
so you'd have to get creative in how you encoded the discounts...

But I suspect a better approach would be to store each discount as
a separate document. If you're in the trunk version, you could then
group results by, say, ISBN and get responses grouped together...

Best
Erick

On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
> Hi,
> Discounts can change daily. Also there can be a lot of them (over time and
> in a given time period ).
>
> Could you give an example of what you mean buy multi-valuing the field.
>
> Thanks
>
> On 3 June 2011 14:29, Erick Erickson  wrote:
>
>> How often are the discounts changed? Because you can simply
>> re-index the book information with a multiValued "discounts" field
>> and get something similar to your example (&wt=json)
>>
>>
>> Best
>> Erick
>>
>> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
>> > What is the "best practice" method to index the following in Solr:
>> >
>> > I'm attempting to use solr for a book store site.
>> >
>> > Each book will have a price but on occasions this will be discounted. The
>> > discounted price exists for a defined time period but there may be many
>> > discount periods. Each discount will have a brief synopsis, start and end
>> > time.
>> >
>> > A subset of the desired output would be as follows:
>> >
>> > ...
>> > "response":{"numFound":1,"start":0,"docs":[
>> >  {
>> >    "name":"The Book",
>> >    "price":"$9.99",
>> >    "discounts":[
>> >        {
>> >         "price":"$3.00",
>> >         "synopsis":"thanksgiving special",
>> >         "starts":"11-24-2011",
>> >         "ends":"11-25-2011",
>> >        },
>> >        {
>> >         "price":"$4.00",
>> >         "synopsis":"Canadian thanksgiving special",
>> >         "starts":"10-10-2011",
>> >         "ends":"10-11-2011",
>> >        },
>> >     ]
>> >  },
>> >  .
>> >
>> > A requirement is to be able to search for just discounted publications. I
>> > think I could use date faceting for this ( return publications that are
>> > within a discount window ). When a discount search is performed no
>> > publications that are not currently discounted will be returned.
>> >
>> > My question are:
>> >
>> >   - Does solr support this type of sub documents
>> >
>> > In the above example the discounts are the sub documents. I know solr is
>> not
>> > a relational DB but I would like to store and index the above
>> representation
>> > in a single document if possible.
>> >
>> >   - what is the best method to approach the above
>> >
>> > I can see in many examples the authors tend to denormalize to solve
>> similar
>> > problems. This suggest that for each discount I am required to duplicate
>> the
>> > book data or form a document
>> > association> >.
>> > Which method would you advise?
>> >
>> > It would be nice if solr could return a response structured as above.
>> >
>> > Much Thanks
>> >
>>
>


Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
On 5 June 2011 14:42, Erick Erickson  wrote:

> See: http://wiki.apache.org/solr/SchemaXml
>
> By adding ' "multiValued="true" ' to the field, you can add
> the same field multiple times in a doc, something like
>
> 
> 
>  value1
>  value2
> 
> 
>
> I can't see how that would work as one would need to associate the right
start / end dates and price.
As I understand using multivalued and thus flattening the  discounts would
result in:

{
"name":"The Book",
"price":"$9.99",
"price":"$3.00",
"price":"$4.00","synopsis":"thanksgiving special",
"starts":"11-24-2011",
"starts":"10-10-2011",
"ends":"11-25-2011",
"ends":"10-11-2011",
"synopsis":"Canadian thanksgiving special",
  },

How does one differentiate the different offers?



> But there's no real ability  in Solr to store "sub documents",
> so you'd have to get creative in how you encoded the discounts...
>

This is what I'm asking :)
What is the best / recommended / known patterns for doing this?



>
> But I suspect a better approach would be to store each discount as
> a separate document. If you're in the trunk version, you could then
> group results by, say, ISBN and get responses grouped together...
>

This is an option but seems sub optimal. So say I store the discounts in
multiple documents with ISDN as an attribute and also store the title again
with ISDN as an attribute.

To get
"all books currently discounted"

requires 2 request

* get all discounts currently active
* get all books  using ISDN retrieved from above search

Not that bad. However what happens when I want
"all books that are currently on discount in the "horror" genre containing
the word 'elm' in the title."

The only way I can see in catering for the above search is to duplicate all
searchable fields in my "book" document in my "discount" document. Coming
from a RDBM background this seems wrong.

Is this the correct approach to take?



>
> Best
> Erick
>
> On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
> > Hi,
> > Discounts can change daily. Also there can be a lot of them (over time
> and
> > in a given time period ).
> >
> > Could you give an example of what you mean buy multi-valuing the field.
> >
> > Thanks
> >
> > On 3 June 2011 14:29, Erick Erickson  wrote:
> >
> >> How often are the discounts changed? Because you can simply
> >> re-index the book information with a multiValued "discounts" field
> >> and get something similar to your example (&wt=json)
> >>
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> >> > What is the "best practice" method to index the following in Solr:
> >> >
> >> > I'm attempting to use solr for a book store site.
> >> >
> >> > Each book will have a price but on occasions this will be discounted.
> The
> >> > discounted price exists for a defined time period but there may be
> many
> >> > discount periods. Each discount will have a brief synopsis, start and
> end
> >> > time.
> >> >
> >> > A subset of the desired output would be as follows:
> >> >
> >> > ...
> >> > "response":{"numFound":1,"start":0,"docs":[
> >> >  {
> >> >"name":"The Book",
> >> >"price":"$9.99",
> >> >"discounts":[
> >> >{
> >> > "price":"$3.00",
> >> > "synopsis":"thanksgiving special",
> >> > "starts":"11-24-2011",
> >> > "ends":"11-25-2011",
> >> >},
> >> >{
> >> > "price":"$4.00",
> >> > "synopsis":"Canadian thanksgiving special",
> >> > "starts":"10-10-2011",
> >> > "ends":"10-11-2011",
> >> >},
> >> > ]
> >> >  },
> >> >  .
> >> >
> >> > A requirement is to be able to search for just discounted
> publications. I
> >> > think I could use date faceting for this ( return publications that
> are
> >> > within a discount window ). When a discount search is performed no
> >> > publications that are not currently discounted will be returned.
> >> >
> >> > My question are:
> >> >
> >> >   - Does solr support this type of sub documents
> >> >
> >> > In the above example the discounts are the sub documents. I know solr
> is
> >> not
> >> > a relational DB but I would like to store and index the above
> >> representation
> >> > in a single document if possible.
> >> >
> >> >   - what is the best method to approach the above
> >> >
> >> > I can see in many examples the authors tend to denormalize to solve
> >> similar
> >> > problems. This suggest that for each discount I am required to
> duplicate
> >> the
> >> > book data or form a document
> >> > association<
> http://stackoverflow.com/questions/2689399/solr-associations
> >> >.
> >> > Which method would you advise?
> >> >
> >> > It would be nice if solr could return a response structured as above.
> >> >
> >> > Much Thanks
> >> >
> >>
> >
>


Re: Solr Indexing Patterns

2011-06-06 Thread Erick Erickson
#Everybody# (including me) who has any RDBMS background
doesn't want to flatten data, but that's usually the way to go in
Solr.

Part of whether it's a good idea or not depends on how big the index
gets, and unfortunately the only way to figure that out is to test.

But that's the first approach I'd try.

Good luck!
Erick

On Mon, Jun 6, 2011 at 11:42 AM, Judioo  wrote:
> On 5 June 2011 14:42, Erick Erickson  wrote:
>
>> See: http://wiki.apache.org/solr/SchemaXml
>>
>> By adding ' "multiValued="true" ' to the field, you can add
>> the same field multiple times in a doc, something like
>>
>> 
>> 
>>  value1
>>  value2
>> 
>> 
>>
>> I can't see how that would work as one would need to associate the right
> start / end dates and price.
> As I understand using multivalued and thus flattening the  discounts would
> result in:
>
> {
>    "name":"The Book",
>    "price":"$9.99",
>    "price":"$3.00",
>    "price":"$4.00",    "synopsis":"thanksgiving special",
>    "starts":"11-24-2011",
>    "starts":"10-10-2011",
>    "ends":"11-25-2011",
>    "ends":"10-11-2011",
>    "synopsis":"Canadian thanksgiving special",
>  },
>
> How does one differentiate the different offers?
>
>
>
>> But there's no real ability  in Solr to store "sub documents",
>> so you'd have to get creative in how you encoded the discounts...
>>
>
> This is what I'm asking :)
> What is the best / recommended / known patterns for doing this?
>
>
>
>>
>> But I suspect a better approach would be to store each discount as
>> a separate document. If you're in the trunk version, you could then
>> group results by, say, ISBN and get responses grouped together...
>>
>
> This is an option but seems sub optimal. So say I store the discounts in
> multiple documents with ISDN as an attribute and also store the title again
> with ISDN as an attribute.
>
> To get
> "all books currently discounted"
>
> requires 2 request
>
> * get all discounts currently active
> * get all books  using ISDN retrieved from above search
>
> Not that bad. However what happens when I want
> "all books that are currently on discount in the "horror" genre containing
> the word 'elm' in the title."
>
> The only way I can see in catering for the above search is to duplicate all
> searchable fields in my "book" document in my "discount" document. Coming
> from a RDBM background this seems wrong.
>
> Is this the correct approach to take?
>
>
>
>>
>> Best
>> Erick
>>
>> On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
>> > Hi,
>> > Discounts can change daily. Also there can be a lot of them (over time
>> and
>> > in a given time period ).
>> >
>> > Could you give an example of what you mean buy multi-valuing the field.
>> >
>> > Thanks
>> >
>> > On 3 June 2011 14:29, Erick Erickson  wrote:
>> >
>> >> How often are the discounts changed? Because you can simply
>> >> re-index the book information with a multiValued "discounts" field
>> >> and get something similar to your example (&wt=json)
>> >>
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
>> >> > What is the "best practice" method to index the following in Solr:
>> >> >
>> >> > I'm attempting to use solr for a book store site.
>> >> >
>> >> > Each book will have a price but on occasions this will be discounted.
>> The
>> >> > discounted price exists for a defined time period but there may be
>> many
>> >> > discount periods. Each discount will have a brief synopsis, start and
>> end
>> >> > time.
>> >> >
>> >> > A subset of the desired output would be as follows:
>> >> >
>> >> > ...
>> >> > "response":{"numFound":1,"start":0,"docs":[
>> >> >  {
>> >> >    "name":"The Book",
>> >> >    "price":"$9.99",
>> >> >    "discounts":[
>> >> >        {
>> >> >         "price":"$3.00",
>> >> >         "synopsis":"thanksgiving special",
>> >> >         "starts":"11-24-2011",
>> >> >         "ends":"11-25-2011",
>> >> >        },
>> >> >        {
>> >> >         "price":"$4.00",
>> >> >         "synopsis":"Canadian thanksgiving special",
>> >> >         "starts":"10-10-2011",
>> >> >         "ends":"10-11-2011",
>> >> >        },
>> >> >     ]
>> >> >  },
>> >> >  .
>> >> >
>> >> > A requirement is to be able to search for just discounted
>> publications. I
>> >> > think I could use date faceting for this ( return publications that
>> are
>> >> > within a discount window ). When a discount search is performed no
>> >> > publications that are not currently discounted will be returned.
>> >> >
>> >> > My question are:
>> >> >
>> >> >   - Does solr support this type of sub documents
>> >> >
>> >> > In the above example the discounts are the sub documents. I know solr
>> is
>> >> not
>> >> > a relational DB but I would like to store and index the above
>> >> representation
>> >> > in a single document if possible.
>> >> >
>> >> >   - what is the best method to approach the above
>> >> >
>> >> > I can see in many examples the authors tend to denormalize to solve
>> >> similar
>> >> > problems. This s

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
Thanks

On 6 June 2011 19:32, Erick Erickson  wrote:

> #Everybody# (including me) who has any RDBMS background
> doesn't want to flatten data, but that's usually the way to go in
> Solr.
>
> Part of whether it's a good idea or not depends on how big the index
> gets, and unfortunately the only way to figure that out is to test.
>
> But that's the first approach I'd try.
>
> Good luck!
> Erick
>
> On Mon, Jun 6, 2011 at 11:42 AM, Judioo  wrote:
> > On 5 June 2011 14:42, Erick Erickson  wrote:
> >
> >> See: http://wiki.apache.org/solr/SchemaXml
> >>
> >> By adding ' "multiValued="true" ' to the field, you can add
> >> the same field multiple times in a doc, something like
> >>
> >> 
> >> 
> >>  value1
> >>  value2
> >> 
> >> 
> >>
> >> I can't see how that would work as one would need to associate the right
> > start / end dates and price.
> > As I understand using multivalued and thus flattening the  discounts
> would
> > result in:
> >
> > {
> >"name":"The Book",
> >"price":"$9.99",
> >"price":"$3.00",
> >"price":"$4.00","synopsis":"thanksgiving special",
> >"starts":"11-24-2011",
> >"starts":"10-10-2011",
> >"ends":"11-25-2011",
> >"ends":"10-11-2011",
> >"synopsis":"Canadian thanksgiving special",
> >  },
> >
> > How does one differentiate the different offers?
> >
> >
> >
> >> But there's no real ability  in Solr to store "sub documents",
> >> so you'd have to get creative in how you encoded the discounts...
> >>
> >
> > This is what I'm asking :)
> > What is the best / recommended / known patterns for doing this?
> >
> >
> >
> >>
> >> But I suspect a better approach would be to store each discount as
> >> a separate document. If you're in the trunk version, you could then
> >> group results by, say, ISBN and get responses grouped together...
> >>
> >
> > This is an option but seems sub optimal. So say I store the discounts in
> > multiple documents with ISDN as an attribute and also store the title
> again
> > with ISDN as an attribute.
> >
> > To get
> > "all books currently discounted"
> >
> > requires 2 request
> >
> > * get all discounts currently active
> > * get all books  using ISDN retrieved from above search
> >
> > Not that bad. However what happens when I want
> > "all books that are currently on discount in the "horror" genre
> containing
> > the word 'elm' in the title."
> >
> > The only way I can see in catering for the above search is to duplicate
> all
> > searchable fields in my "book" document in my "discount" document. Coming
> > from a RDBM background this seems wrong.
> >
> > Is this the correct approach to take?
> >
> >
> >
> >>
> >> Best
> >> Erick
> >>
> >> On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
> >> > Hi,
> >> > Discounts can change daily. Also there can be a lot of them (over time
> >> and
> >> > in a given time period ).
> >> >
> >> > Could you give an example of what you mean buy multi-valuing the
> field.
> >> >
> >> > Thanks
> >> >
> >> > On 3 June 2011 14:29, Erick Erickson  wrote:
> >> >
> >> >> How often are the discounts changed? Because you can simply
> >> >> re-index the book information with a multiValued "discounts" field
> >> >> and get something similar to your example (&wt=json)
> >> >>
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> >> >> > What is the "best practice" method to index the following in Solr:
> >> >> >
> >> >> > I'm attempting to use solr for a book store site.
> >> >> >
> >> >> > Each book will have a price but on occasions this will be
> discounted.
> >> The
> >> >> > discounted price exists for a defined time period but there may be
> >> many
> >> >> > discount periods. Each discount will have a brief synopsis, start
> and
> >> end
> >> >> > time.
> >> >> >
> >> >> > A subset of the desired output would be as follows:
> >> >> >
> >> >> > ...
> >> >> > "response":{"numFound":1,"start":0,"docs":[
> >> >> >  {
> >> >> >"name":"The Book",
> >> >> >"price":"$9.99",
> >> >> >"discounts":[
> >> >> >{
> >> >> > "price":"$3.00",
> >> >> > "synopsis":"thanksgiving special",
> >> >> > "starts":"11-24-2011",
> >> >> > "ends":"11-25-2011",
> >> >> >},
> >> >> >{
> >> >> > "price":"$4.00",
> >> >> > "synopsis":"Canadian thanksgiving special",
> >> >> > "starts":"10-10-2011",
> >> >> > "ends":"10-11-2011",
> >> >> >},
> >> >> > ]
> >> >> >  },
> >> >> >  .
> >> >> >
> >> >> > A requirement is to be able to search for just discounted
> >> publications. I
> >> >> > think I could use date faceting for this ( return publications that
> >> are
> >> >> > within a discount window ). When a discount search is performed no
> >> >> > publications that are not currently discounted will be returned.
> >> >> >
> >> >> > My question are:
> >> >> >
> >> >> >   - Does solr support this type of sub documents
> >> >> >
> >> >> > In the above example th

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
I do think that Solr would be better served if there was a *best practice
section *of the site.

Looking at the majority of emails to this list they resolve around "how do I
do X?".

Seems like tutorials with real world examples would serve Solr no end of
good.

I still do not have an example of the best method to approach my problem,
although Erick has  help me understand the limitations of Solr.

Just thought I'd say.






On 6 June 2011 20:26, Judioo  wrote:

> Thanks
>
>
> On 6 June 2011 19:32, Erick Erickson  wrote:
>
>> #Everybody# (including me) who has any RDBMS background
>> doesn't want to flatten data, but that's usually the way to go in
>> Solr.
>>
>> Part of whether it's a good idea or not depends on how big the index
>> gets, and unfortunately the only way to figure that out is to test.
>>
>> But that's the first approach I'd try.
>>
>> Good luck!
>> Erick
>>
>> On Mon, Jun 6, 2011 at 11:42 AM, Judioo  wrote:
>> > On 5 June 2011 14:42, Erick Erickson  wrote:
>> >
>> >> See: http://wiki.apache.org/solr/SchemaXml
>> >>
>> >> By adding ' "multiValued="true" ' to the field, you can add
>> >> the same field multiple times in a doc, something like
>> >>
>> >> 
>> >> 
>> >>  value1
>> >>  value2
>> >> 
>> >> 
>> >>
>> >> I can't see how that would work as one would need to associate the
>> right
>> > start / end dates and price.
>> > As I understand using multivalued and thus flattening the  discounts
>> would
>> > result in:
>> >
>> > {
>> >"name":"The Book",
>> >"price":"$9.99",
>> >"price":"$3.00",
>> >"price":"$4.00","synopsis":"thanksgiving special",
>> >"starts":"11-24-2011",
>> >"starts":"10-10-2011",
>> >"ends":"11-25-2011",
>> >"ends":"10-11-2011",
>> >"synopsis":"Canadian thanksgiving special",
>> >  },
>> >
>> > How does one differentiate the different offers?
>> >
>> >
>> >
>> >> But there's no real ability  in Solr to store "sub documents",
>> >> so you'd have to get creative in how you encoded the discounts...
>> >>
>> >
>> > This is what I'm asking :)
>> > What is the best / recommended / known patterns for doing this?
>> >
>> >
>> >
>> >>
>> >> But I suspect a better approach would be to store each discount as
>> >> a separate document. If you're in the trunk version, you could then
>> >> group results by, say, ISBN and get responses grouped together...
>> >>
>> >
>> > This is an option but seems sub optimal. So say I store the discounts in
>> > multiple documents with ISDN as an attribute and also store the title
>> again
>> > with ISDN as an attribute.
>> >
>> > To get
>> > "all books currently discounted"
>> >
>> > requires 2 request
>> >
>> > * get all discounts currently active
>> > * get all books  using ISDN retrieved from above search
>> >
>> > Not that bad. However what happens when I want
>> > "all books that are currently on discount in the "horror" genre
>> containing
>> > the word 'elm' in the title."
>> >
>> > The only way I can see in catering for the above search is to duplicate
>> all
>> > searchable fields in my "book" document in my "discount" document.
>> Coming
>> > from a RDBM background this seems wrong.
>> >
>> > Is this the correct approach to take?
>> >
>> >
>> >
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
>> >> > Hi,
>> >> > Discounts can change daily. Also there can be a lot of them (over
>> time
>> >> and
>> >> > in a given time period ).
>> >> >
>> >> > Could you give an example of what you mean buy multi-valuing the
>> field.
>> >> >
>> >> > Thanks
>> >> >
>> >> > On 3 June 2011 14:29, Erick Erickson 
>> wrote:
>> >> >
>> >> >> How often are the discounts changed? Because you can simply
>> >> >> re-index the book information with a multiValued "discounts" field
>> >> >> and get something similar to your example (&wt=json)
>> >> >>
>> >> >>
>> >> >> Best
>> >> >> Erick
>> >> >>
>> >> >> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
>> >> >> > What is the "best practice" method to index the following in Solr:
>> >> >> >
>> >> >> > I'm attempting to use solr for a book store site.
>> >> >> >
>> >> >> > Each book will have a price but on occasions this will be
>> discounted.
>> >> The
>> >> >> > discounted price exists for a defined time period but there may be
>> >> many
>> >> >> > discount periods. Each discount will have a brief synopsis, start
>> and
>> >> end
>> >> >> > time.
>> >> >> >
>> >> >> > A subset of the desired output would be as follows:
>> >> >> >
>> >> >> > ...
>> >> >> > "response":{"numFound":1,"start":0,"docs":[
>> >> >> >  {
>> >> >> >"name":"The Book",
>> >> >> >"price":"$9.99",
>> >> >> >"discounts":[
>> >> >> >{
>> >> >> > "price":"$3.00",
>> >> >> > "synopsis":"thanksgiving special",
>> >> >> > "starts":"11-24-2011",
>> >> >> > "ends":"11-25-2011",
>> >> >> >},
>> >> >> >{
>> >> >> > "price":"$4.00",
>> >> >> > "synopsis":"Canadian thanksgiving special",
>>

Re: Solr Indexing Patterns

2011-06-06 Thread Jonathan Rochkind

This is a start, for many common best practices:

http://wiki.apache.org/solr/SolrRelevancyFAQ

Many of the questions in there have an answer that involves 
de-normalizing. As an example. It may be that even if your specific 
problem isn't in there,  I myself anyway found reading through there 
gave me a general sense of common patterns in Solr.


( It's certainly true that some things are hard to do in Solr.  It turns 
out that an RDBMS is a remarkably flexible thing -- but when it doesn't 
do something you need well, and you turn to a specialized tool instead 
like Solr, you certainly give up some things


One of the biggest areas of limitation involves hieararchical or 
relationship data, definitely. There are a variety of features, some 
more fully baked than others, some not yet in a Solr release, meant to 
provide tools to get at different aspects of this. Including "pivot 
facetting",  "join" (https://issues.apache.org/jira/browse/SOLR-2272), 
and field-collapsing.  Each, IMO, is trying to deal with different 
aspects of dealing with hieararchical or multi-class data, or data that 
is entities with relationships. ).


On 6/6/2011 3:43 PM, Judioo wrote:

I do think that Solr would be better served if there was a *best practice
section *of the site.

Looking at the majority of emails to this list they resolve around "how do I
do X?".

Seems like tutorials with real world examples would serve Solr no end of
good.

I still do not have an example of the best method to approach my problem,
although Erick has  help me understand the limitations of Solr.

Just thought I'd say.






On 6 June 2011 20:26, Judioo  wrote:


Thanks


On 6 June 2011 19:32, Erick Erickson  wrote:


#Everybody# (including me) who has any RDBMS background
doesn't want to flatten data, but that's usually the way to go in
Solr.

Part of whether it's a good idea or not depends on how big the index
gets, and unfortunately the only way to figure that out is to test.

But that's the first approach I'd try.

Good luck!
Erick

On Mon, Jun 6, 2011 at 11:42 AM, Judioo  wrote:

On 5 June 2011 14:42, Erick Erickson  wrote:


See: http://wiki.apache.org/solr/SchemaXml

By adding ' "multiValued="true" ' to the field, you can add
the same field multiple times in a doc, something like



  value1
  value2



I can't see how that would work as one would need to associate the

right

start / end dates and price.
As I understand using multivalued and thus flattening the  discounts

would

result in:

{
"name":"The Book",
"price":"$9.99",
"price":"$3.00",
"price":"$4.00","synopsis":"thanksgiving special",
"starts":"11-24-2011",
"starts":"10-10-2011",
"ends":"11-25-2011",
"ends":"10-11-2011",
"synopsis":"Canadian thanksgiving special",
  },

How does one differentiate the different offers?




But there's no real ability  in Solr to store "sub documents",
so you'd have to get creative in how you encoded the discounts...


This is what I'm asking :)
What is the best / recommended / known patterns for doing this?




But I suspect a better approach would be to store each discount as
a separate document. If you're in the trunk version, you could then
group results by, say, ISBN and get responses grouped together...


This is an option but seems sub optimal. So say I store the discounts in
multiple documents with ISDN as an attribute and also store the title

again

with ISDN as an attribute.

To get
"all books currently discounted"

requires 2 request

* get all discounts currently active
* get all books  using ISDN retrieved from above search

Not that bad. However what happens when I want
"all books that are currently on discount in the "horror" genre

containing

the word 'elm' in the title."

The only way I can see in catering for the above search is to duplicate

all

searchable fields in my "book" document in my "discount" document.

Coming

from a RDBM background this seems wrong.

Is this the correct approach to take?




Best
Erick

On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:

Hi,
Discounts can change daily. Also there can be a lot of them (over

time

and

in a given time period ).

Could you give an example of what you mean buy multi-valuing the

field.

Thanks

On 3 June 2011 14:29, Erick Erickson

wrote:

How often are the discounts changed? Because you can simply
re-index the book information with a multiValued "discounts" field
and get something similar to your example (&wt=json)


Best
Erick

On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:

What is the "best practice" method to index the following in Solr:

I'm attempting to use solr for a book store site.

Each book will have a price but on occasions this will be

discounted.

The

discounted price exists for a defined time period but there may be

many

discount periods. Each discount will have a brief synopsis, start

and

end

time.

A subset of the desired output would be as follows:

...
"response":{"numFound":1,"start":0,"docs":[
  

Available Solr Indexing strategies

2011-06-07 Thread zarni aung
Hi,

I am very new to Solr and my client is trying to implement full text
searching capabilities to their product by using Solr.  They will also have
master storage that would be the Authoritative data store which will also
provide meta data searches.  Can you please point me in the right direction
for some indexing strategies that people are using for further research.

Thank you,

Zarni


Re: Solr Indexing Patterns

2011-06-09 Thread Judioo
Very informative links and statement Jonathan. thank you.



On 6 June 2011 20:55, Jonathan Rochkind  wrote:

> This is a start, for many common best practices:
>
> http://wiki.apache.org/solr/SolrRelevancyFAQ
>
> Many of the questions in there have an answer that involves de-normalizing.
> As an example. It may be that even if your specific problem isn't in there,
>  I myself anyway found reading through there gave me a general sense of
> common patterns in Solr.
>
> ( It's certainly true that some things are hard to do in Solr.  It turns
> out that an RDBMS is a remarkably flexible thing -- but when it doesn't do
> something you need well, and you turn to a specialized tool instead like
> Solr, you certainly give up some things
>
> One of the biggest areas of limitation involves hieararchical or
> relationship data, definitely. There are a variety of features, some more
> fully baked than others, some not yet in a Solr release, meant to provide
> tools to get at different aspects of this. Including "pivot facetting",
>  "join" (https://issues.apache.org/jira/browse/SOLR-2272), and
> field-collapsing.  Each, IMO, is trying to deal with different aspects of
> dealing with hieararchical or multi-class data, or data that is entities
> with relationships. ).
>
>
> On 6/6/2011 3:43 PM, Judioo wrote:
>
>> I do think that Solr would be better served if there was a *best practice
>> section *of the site.
>>
>> Looking at the majority of emails to this list they resolve around "how do
>> I
>> do X?".
>>
>> Seems like tutorials with real world examples would serve Solr no end of
>> good.
>>
>> I still do not have an example of the best method to approach my problem,
>> although Erick has  help me understand the limitations of Solr.
>>
>> Just thought I'd say.
>>
>>
>>
>>
>>
>>
>> On 6 June 2011 20:26, Judioo  wrote:
>>
>>  Thanks
>>>
>>>
>>> On 6 June 2011 19:32, Erick Erickson  wrote:
>>>
>>>  #Everybody# (including me) who has any RDBMS background
 doesn't want to flatten data, but that's usually the way to go in
 Solr.

 Part of whether it's a good idea or not depends on how big the index
 gets, and unfortunately the only way to figure that out is to test.

 But that's the first approach I'd try.

 Good luck!
 Erick

 On Mon, Jun 6, 2011 at 11:42 AM, Judioo  wrote:

> On 5 June 2011 14:42, Erick Erickson  wrote:
>
>  See: http://wiki.apache.org/solr/SchemaXml
>>
>> By adding ' "multiValued="true" ' to the field, you can add
>> the same field multiple times in a doc, something like
>>
>> 
>> 
>>  value1
>>  value2
>> 
>> 
>>
>> I can't see how that would work as one would need to associate the
>>
> right

> start / end dates and price.
> As I understand using multivalued and thus flattening the  discounts
>
 would

> result in:
>
> {
>"name":"The Book",
>"price":"$9.99",
>"price":"$3.00",
>"price":"$4.00","synopsis":"thanksgiving special",
>"starts":"11-24-2011",
>"starts":"10-10-2011",
>"ends":"11-25-2011",
>"ends":"10-11-2011",
>"synopsis":"Canadian thanksgiving special",
>  },
>
> How does one differentiate the different offers?
>
>
>
>  But there's no real ability  in Solr to store "sub documents",
>> so you'd have to get creative in how you encoded the discounts...
>>
>>  This is what I'm asking :)
> What is the best / recommended / known patterns for doing this?
>
>
>
>  But I suspect a better approach would be to store each discount as
>> a separate document. If you're in the trunk version, you could then
>> group results by, say, ISBN and get responses grouped together...
>>
>>  This is an option but seems sub optimal. So say I store the discounts
> in
> multiple documents with ISDN as an attribute and also store the title
>
 again

> with ISDN as an attribute.
>
> To get
> "all books currently discounted"
>
> requires 2 request
>
> * get all discounts currently active
> * get all books  using ISDN retrieved from above search
>
> Not that bad. However what happens when I want
> "all books that are currently on discount in the "horror" genre
>
 containing

> the word 'elm' in the title."
>
> The only way I can see in catering for the above search is to duplicate
>
 all

> searchable fields in my "book" document in my "discount" document.
>
 Coming

> from a RDBM background this seems wrong.
>
> Is this the correct approach to take?
>
>
>
>  Best
>> Erick
>>
>> On Sat, Jun 4, 2011 at 1:42 AM, Judioo  wrote:
>>
>>> Hi,
>>> Discounts can change daily. Also there can be a lot of them (over
>>>
>> time

> and
>>
>>> 

Re: Solr indexing questions

2011-06-11 Thread Jamie Johnson
I'm not sure about your first question but your second question (searching
across fields using a single keyword) I believe is exactly what dismax was
created for, check out
http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ for some
information.  In regards to spelling errors you could add a phonetic field
which would be included in the weighted result, a quick google gave me this
http://search.lucidimagination.com/search/document/CDRG_ch05_5.6.12.
Synonyms are also pretty straight forward and are included in the sample
that ships with Solr.

On Sat, Jun 11, 2011 at 10:35 AM, Frank A  wrote:

> I currently have my site setup using SOLR for some pretty simple queries
> and
> am looking to add some additional features and was hoping to get some
> guidance.
>
> Heres my situation, for a given restaurant I have the following info:
>
> rest name,
> editorial,
> list of features (e.g. Reservations, Good for Groups, etc)
> list of cuisines (American, Italian, etc)
> List of user reviews
> Additional meta data
>
> There are 2 different things I want to do:
>
> Build a directory based on "keywords or phrases" - e.g. looking through all
> the data find the common keywords/phrases - e.g. "hot dog" or "Brazilian
> steakhouse". I'm not sure how to extract these keyphrases from the data
> without having to input them myself.  Is this a good fit for SOLR?  If so
> what features should I look into?
>
> Second, is an "advanced" search that basically matches user input on ANY of
> the fields.  However I'd like it to have some basic handling for mispelled
> words, synonyms (bbq and bar-b-q) and weight the user of the terms
> differently (e.g. name of restaurant vs. in a users comments).  I'm sure
> this is SOLRs sweet spot but I'm having trouble figuring out how to put it
> all together.
>
> Thanks in advance.
>


Re: offline solr indexing

2009-04-27 Thread Amit Nithian
Not sure if this helps but could you make this a solr server that is not
accessible by any other means (except internal), perform your index build
using the dataimporthandler and use Solr's replication mechanisms to move
the indices across?
You can issue the HTTP request to rebuild the index from the command line
(i.e. GET ..)

On Mon, Apr 27, 2009 at 12:08 PM, Charles Federspiel <
charles.federsp...@gmail.com> wrote:

> Solr Users,
> Our app servers are setup on read-only filesystems.  Is there a way
> to perform indexing from the command line, then copy the index files to the
> app-server and use Solr to perform search from inside the servlet
> container?
>
> If the Solr implementation is bound to http requests, can Solr perform
> searches against an index that I create with Lucene?
> thank you,
> Charles Federspiel
>


Re: offline solr indexing

2009-04-27 Thread Shalin Shekhar Mangar
On Tue, Apr 28, 2009 at 12:38 AM, Charles Federspiel <
charles.federsp...@gmail.com> wrote:

> Solr Users,
> Our app servers are setup on read-only filesystems.  Is there a way
> to perform indexing from the command line, then copy the index files to the
> app-server and use Solr to perform search from inside the servlet
> container?


If the filesystem is read-only, then how can you index at all?

But what I think you are describing is the regular master-slave setup that
we use. A dedicated master on which writes are performed. Multiple slaves on
which searches are performed. The index is replicated to slaves through
script or the new java based replication.


> If the Solr implementation is bound to http requests, can Solr perform
> searches against an index that I create with Lucene?
> thank you,


It can but it is a little tricky to get the schema and analysis correct
between your Lucene writer and Solr searcher.

-- 
Regards,
Shalin Shekhar Mangar.


Re: offline solr indexing

2009-05-02 Thread Charles Federspiel
Thanks. I imagine replication to a slave would require a filesystem writable
by that slave.

I think it helps to realize that indexing is really a function of Content
Management.  After some discussion with coworkers, I've learned that our
internal CMS server runs within tomcat and shares a filesystem with our
public app-servers.  So I'm hoping to deploy solr both to the tomcat
instance (for indexing) and our other instances (for searching) simply
sharing a Solr home between them.  How bad is this? Does updating and
commiting the index interrupt search?  It would only affect our internal
instance, but I still need to know all the effects of this setup.

On Mon, Apr 27, 2009 at 6:32 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Apr 28, 2009 at 12:38 AM, Charles Federspiel <
> charles.federsp...@gmail.com> wrote:
>
> > Solr Users,
> > Our app servers are setup on read-only filesystems.  Is there a way
> > to perform indexing from the command line, then copy the index files to
> the
> > app-server and use Solr to perform search from inside the servlet
> > container?
>
>
> If the filesystem is read-only, then how can you index at all?
>
> But what I think you are describing is the regular master-slave setup that
> we use. A dedicated master on which writes are performed. Multiple slaves
> on
> which searches are performed. The index is replicated to slaves through
> script or the new java based replication.
>
>
> > If the Solr implementation is bound to http requests, can Solr perform
> > searches against an index that I create with Lucene?
> > thank you,
>
>
> It can but it is a little tricky to get the schema and analysis correct
> between your Lucene writer and Solr searcher.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: offline solr indexing

2009-05-04 Thread Otis Gospodnetic

This should be fine.  You won't have to replicate your index, just reopen the 
searcher when commit is done, that's all.  Index updates and searches can be 
happening at the same time.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Charles Federspiel 
> To: solr-user@lucene.apache.org
> Sent: Saturday, May 2, 2009 12:40:13 PM
> Subject: Re: offline solr indexing
> 
> Thanks. I imagine replication to a slave would require a filesystem writable
> by that slave.
> 
> I think it helps to realize that indexing is really a function of Content
> Management.  After some discussion with coworkers, I've learned that our
> internal CMS server runs within tomcat and shares a filesystem with our
> public app-servers.  So I'm hoping to deploy solr both to the tomcat
> instance (for indexing) and our other instances (for searching) simply
> sharing a Solr home between them.  How bad is this? Does updating and
> commiting the index interrupt search?  It would only affect our internal
> instance, but I still need to know all the effects of this setup.
> 
> On Mon, Apr 27, 2009 at 6:32 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
> 
> > On Tue, Apr 28, 2009 at 12:38 AM, Charles Federspiel <
> > charles.federsp...@gmail.com> wrote:
> >
> > > Solr Users,
> > > Our app servers are setup on read-only filesystems.  Is there a way
> > > to perform indexing from the command line, then copy the index files to
> > the
> > > app-server and use Solr to perform search from inside the servlet
> > > container?
> >
> >
> > If the filesystem is read-only, then how can you index at all?
> >
> > But what I think you are describing is the regular master-slave setup that
> > we use. A dedicated master on which writes are performed. Multiple slaves
> > on
> > which searches are performed. The index is replicated to slaves through
> > script or the new java based replication.
> >
> >
> > > If the Solr implementation is bound to http requests, can Solr perform
> > > searches against an index that I create with Lucene?
> > > thank you,
> >
> >
> > It can but it is a little tricky to get the schema and analysis correct
> > between your Lucene writer and Solr searcher.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >



Solr Indexing slows down

2010-07-29 Thread Peter Karich
Hi,

I am indexing a solr 1.4.0 core and commiting gets slower and slower.
Starting from 3-5 seconds for ~200 documents and ending with over 60
seconds after 800 commits. Then, if I reloaded the index, it is as fast
as before! And today I have read a similar thread [1] and indeed: if I
set autowarming for the caches to 0 the slowdown disappears.

BUT at the same time I would like to offer searching on that core, which
would be dramatically slowed down (due to no autowarming).

Does someone know a better solution to avoid index-slow-down?

Regards,
Peter.

[1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html


Speeding up solr indexing

2010-10-08 Thread sivaprasad

Hi,
I am indexing the data using DIH.Data coming from mysql.Each document
contains 30 fields.Some of the fields are multi valued.When i am trying to
index 10 million records it taking more time to index.

Any body has suggestions to speed up indexing process?Any suggestions on
solr admin level configurations?


Thanks,
JS
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Plurals in solr indexing

2010-01-27 Thread murali k

Hi, 
I am having trouble with indexing plurals, 

I have the schema with following fields
gender (field) - string (field type) (eg. data Boys)
all (field) - text (field type)  - solr.WhitespaceTokenizerFactory,
solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory,
solr.LowerCaseFilterFactory, SnowballPorterFilterFactory

i am using copyField from gender to all

and searching on "all" field

When i search for Boy, I get the results, If i search for Boys i dont get
results, 
I have tried things like "boys bikes" - no results
"boy bikes" - works

kid and kids are synonymns for boy and boys, so i tried adding 
kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way

I also have other content fields which are copied to "all" , and it contains
words like "kids, boys" etc...
any idea?





-- 
View this message in context: 
http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dynamic Solr indexing

2010-03-01 Thread Jan Høydahl / Cominvent
Hi,

In current version you need to handle the cluster layout yourself, both on 
indexing and search side, i.e. route documents to shards as you please, and 
know what shards to search.

We try to address how to make this easier in 
http://wiki.apache.org/solr/SolrCloud - have a look at it. The idea is that 
there is a component that knows about the layout of the search cluster, and we 
can then use this to know what shards to index to and search. If we build a 
component which automatically routes documents to shards, your use case could 
be implemented as one particular routing strategy, i.e. move to next shard when 
the current is "full" - ideal for news type of indexes.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 1. mars 2010, at 18.58, Peter S wrote:

> 
> Hi,
> 
> 
> 
> I wonder if anyone could shed some insight on a dynamic indexing question...?
> 
> 
> 
> The basic requirement is this:
> 
> 
> 
> Indexing:
> 
> A process writes to an index, and when it reaches a certain size (say, 1GB), 
> a new index (core) is 'automatically' created/deployed (i.e. the process 
> doesn't know about it) and further indexing now goes into the new core. When 
> that one reaches its threshold size, a new index is deplyoed, and so on.
> 
> The process that is writing to the indices doesn't actually know that it is 
> writing to different cores.
> 
> 
> 
> Searching:
> 
> When a search is directed at the above index, the actual search is a 
> distrbitued shard search across all the shards that have been deployed. 
> Again, the searcher process doesn't know this, but gets back the aggregated 
> results, as if it had specified all the shards in the request URL, but as 
> these are changing dynamically, it of course can't know what they all are at 
> any given time.
> 
> 
> 
> This requirement sounds to me perhaps like a Katta thing. I've had a look at 
> Solr-1395, and there's questions in Lucid that sound similar (e.g. 
> http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
>  so I guess (hope) I'm not the only one with this requirement.
> 
> 
> 
> I couldn't find anything in either Katta or SOLR-1395 that fit both the 
> writing and searching requirement, but I could easily have missed it.
> 
> 
> 
> Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be 
> 'production-ready'? Has anyone deployed this type of thing in a production 
> environment?
> 
> 
> 
> Any insight/advice would be greatly appreciated.
> 
> 
> 
> Thanks!
> 
> Peter
> 
> 
> 
> 
> 
> _
> Do you have a story that started on Hotmail? Tell us now
> http://clk.atdmt.com/UKM/go/195013117/direct/01/



RE: Dynamic Solr indexing

2010-03-01 Thread Peter S

Hi Jan,

 

Thanks very much for your message. SolrCloud sounds very cool indeed...

 

So, from the Wiki, am I right in understanding that the only 'external' 
component is ZooKeeper, everything else is pure Solr (i.e. replication, distrib 
queries et al. are all Solr http a.o.t. something like Hadoop ipc)? If so, this 
makes it a nice tight package, keeping external dependencies to minimum. Is 
SolrCloud 'ready for primetime' production at present?

 

Apologies for all the questions - Is SolrCloud marked for inclusion in 1.5?

 

Many thanks!

Peter

 


 
> Subject: Re: Dynamic Solr indexing
> From: jan@cominvent.com
> Date: Tue, 2 Mar 2010 00:48:50 +0100
> To: solr-user@lucene.apache.org
> 
> Hi,
> 
> In current version you need to handle the cluster layout yourself, both on 
> indexing and search side, i.e. route documents to shards as you please, and 
> know what shards to search.
> 
> We try to address how to make this easier in 
> http://wiki.apache.org/solr/SolrCloud - have a look at it. The idea is that 
> there is a component that knows about the layout of the search cluster, and 
> we can then use this to know what shards to index to and search. If we build 
> a component which automatically routes documents to shards, your use case 
> could be implemented as one particular routing strategy, i.e. move to next 
> shard when the current is "full" - ideal for news type of indexes.
> 
> --
> Jan Høydahl - search architect
> Cominvent AS - www.cominvent.com
> 
> On 1. mars 2010, at 18.58, Peter S wrote:
> 
> > 
> > Hi,
> > 
> > 
> > 
> > I wonder if anyone could shed some insight on a dynamic indexing 
> > question...?
> > 
> > 
> > 
> > The basic requirement is this:
> > 
> > 
> > 
> > Indexing:
> > 
> > A process writes to an index, and when it reaches a certain size (say, 
> > 1GB), a new index (core) is 'automatically' created/deployed (i.e. the 
> > process doesn't know about it) and further indexing now goes into the new 
> > core. When that one reaches its threshold size, a new index is deplyoed, 
> > and so on.
> > 
> > The process that is writing to the indices doesn't actually know that it is 
> > writing to different cores.
> > 
> > 
> > 
> > Searching:
> > 
> > When a search is directed at the above index, the actual search is a 
> > distrbitued shard search across all the shards that have been deployed. 
> > Again, the searcher process doesn't know this, but gets back the aggregated 
> > results, as if it had specified all the shards in the request URL, but as 
> > these are changing dynamically, it of course can't know what they all are 
> > at any given time.
> > 
> > 
> > 
> > This requirement sounds to me perhaps like a Katta thing. I've had a look 
> > at Solr-1395, and there's questions in Lucid that sound similar (e.g. 
> > http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
> >  so I guess (hope) I'm not the only one with this requirement.
> > 
> > 
> > 
> > I couldn't find anything in either Katta or SOLR-1395 that fit both the 
> > writing and searching requirement, but I could easily have missed it.
> > 
> > 
> > 
> > Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be 
> > 'production-ready'? Has anyone deployed this type of thing in a production 
> > environment?
> > 
> > 
> > 
> > Any insight/advice would be greatly appreciated.
> > 
> > 
> > 
> > Thanks!
> > 
> > Peter
> > 
> > 
> > 
> > 
> > 
> > _
> > Do you have a story that started on Hotmail? Tell us now
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 
  
_
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Dynamic Solr indexing

2010-03-10 Thread Jan Høydahl / Cominvent
Hi,

Yes, it will be a really nice package. I think the aim is to keep the ZK stuff 
optional, which can be nice for small installs or upgrading without embracing 
the ZK parts. All of this is still in the beginning of development.

Much of the cloud stuff is aimed at 1.5 but there are as usual no dates...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 2. mars 2010, at 01.18, Peter S wrote:

> 
> Hi Jan,
> 
> 
> 
> Thanks very much for your message. SolrCloud sounds very cool indeed...
> 
> 
> 
> So, from the Wiki, am I right in understanding that the only 'external' 
> component is ZooKeeper, everything else is pure Solr (i.e. replication, 
> distrib queries et al. are all Solr http a.o.t. something like Hadoop ipc)? 
> If so, this makes it a nice tight package, keeping external dependencies to 
> minimum. Is SolrCloud 'ready for primetime' production at present?
> 
> 
> 
> Apologies for all the questions - Is SolrCloud marked for inclusion in 1.5?
> 
> 
> 
> Many thanks!
> 
> Peter
> 
> 
> 
> 
> 
>> Subject: Re: Dynamic Solr indexing
>> From: jan@cominvent.com
>> Date: Tue, 2 Mar 2010 00:48:50 +0100
>> To: solr-user@lucene.apache.org
>> 
>> Hi,
>> 
>> In current version you need to handle the cluster layout yourself, both on 
>> indexing and search side, i.e. route documents to shards as you please, and 
>> know what shards to search.
>> 
>> We try to address how to make this easier in 
>> http://wiki.apache.org/solr/SolrCloud - have a look at it. The idea is that 
>> there is a component that knows about the layout of the search cluster, and 
>> we can then use this to know what shards to index to and search. If we build 
>> a component which automatically routes documents to shards, your use case 
>> could be implemented as one particular routing strategy, i.e. move to next 
>> shard when the current is "full" - ideal for news type of indexes.
>> 
>> --
>> Jan Høydahl - search architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 1. mars 2010, at 18.58, Peter S wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> 
>>> 
>>> I wonder if anyone could shed some insight on a dynamic indexing 
>>> question...?
>>> 
>>> 
>>> 
>>> The basic requirement is this:
>>> 
>>> 
>>> 
>>> Indexing:
>>> 
>>> A process writes to an index, and when it reaches a certain size (say, 
>>> 1GB), a new index (core) is 'automatically' created/deployed (i.e. the 
>>> process doesn't know about it) and further indexing now goes into the new 
>>> core. When that one reaches its threshold size, a new index is deplyoed, 
>>> and so on.
>>> 
>>> The process that is writing to the indices doesn't actually know that it is 
>>> writing to different cores.
>>> 
>>> 
>>> 
>>> Searching:
>>> 
>>> When a search is directed at the above index, the actual search is a 
>>> distrbitued shard search across all the shards that have been deployed. 
>>> Again, the searcher process doesn't know this, but gets back the aggregated 
>>> results, as if it had specified all the shards in the request URL, but as 
>>> these are changing dynamically, it of course can't know what they all are 
>>> at any given time.
>>> 
>>> 
>>> 
>>> This requirement sounds to me perhaps like a Katta thing. I've had a look 
>>> at Solr-1395, and there's questions in Lucid that sound similar (e.g. 
>>> http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
>>>  so I guess (hope) I'm not the only one with this requirement.
>>> 
>>> 
>>> 
>>> I couldn't find anything in either Katta or SOLR-1395 that fit both the 
>>> writing and searching requirement, but I could easily have missed it.
>>> 
>>> 
>>> 
>>> Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be 
>>> 'production-ready'? Has anyone deployed this type of thing in a production 
>>> environment?
>>> 
>>> 
>>> 
>>> Any insight/advice would be greatly appreciated.
>>> 
>>> 
>>> 
>>> Thanks!
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _
>>> Do you have a story that started on Hotmail? Tell us now
>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
>> 
> 
> _
> Tell us your greatest, weirdest and funniest Hotmail stories
> http://clk.atdmt.com/UKM/go/195013117/direct/01/



Solr indexing configuration help

2008-05-28 Thread gaku113

Hi all Solr users/developers/experts,

I have the following scenario and I appreciate any advice for tuning my solr
master server.  

I have a field in my schema that would index (but not stored) about ~1
ids for each document.  This field is expected to govern the size of the
document.  Each id can contain up to 6 characters.  I figure that there are
two alternatives for this field, one is the use a string multi-valued field,
and the other would be to pass a white-space-delimited string to solr and
have solr tokenize such string based on whitespace (the text_ws fieldType).  
The master server is expected to receive constant stream of updates.

The expected/estimated document size can range from 50k to 100k for a single
document.  (I know this is quite large). The number of documents is expected
to be around 200,000 on each master server, and there can be multiple master
servers (sharding).  I wish the master can handle more docs too if I can
figure a way out.  

Currently, I’m performing some basic stress tests to simulate the indexing
side on the master server.  This stress test would continuously add new
documents at the rate of about 10 documents every 30 seconds.  Autocommit is
being used (50 docs and 180 seconds constraints), but I have no idea if this
is the preferred way.  The goal is to keep adding new documents until we can
get at least 200,000 documents (or about 20GB of index) on the master (or
even more if the server can handle it)

What I experienced from the indexing stress test is that the master server
failed to respond after a while, such as non-pingable when there are about
30k documents.  When looking at the log, they are mostly:
java.lang.OutOfMemoryError: Java heap space
OR
Ping query caused exception: null (this is probably caused by the OOM
problem)

There were also a few cases that the java process even went away.

Questions:
1)  Is it better to use the multi-valued string field or the text_ws field
for this large field?
2)  Is it better to have more outstanding docs per commit or more frequent
commit, in term of maximizing server resources?  What is the preferred way
to commit documents assuming that solr master receives updates frequently?
How many updated docs should there be before issuing a commit? 
3)  How to avoid the OOM problem in my case? I’m already doing (-Xms1536M
-Xmx1536M) on a 2-GB machine. Is that not enough?  I’m concerned that adding
more Ram would just delay the OOM problem.  Any additional JVM option to
consider?
4)  Any recommendation for the master server configuration, in a sense that 
I
can maximize the number of indexed docs?
5)  How can it disable caching on the master altogether as queries won’t hit
the master?
6)  For an average doc size of 50k-100k, is that too large for solr, or even
solr is the right tool? If not, any alternative?  If we are able to reduce
the size of docs, can we expect to index more documents?

The followings are info related to software/hardware/configuration:

Solr version (solr nightly build on 5/23/2008)
Solr Specification Version: 1.2.2008.05.23.08.06.59
Solr Implementation Version: nightly
Lucene Specification Version: 2.3.2
Lucene Implementation Version: 2.3.2 652650
Jetty: 6.1.3

Schema.xml (the section that I think are relevant to the master server.)



  

  






id

Solrconfig.xml
  
false
10
500
50
5000
2
1000
1
   
org.apache.lucene.index.LogByteSizeMergePolicy
org.apache.lucene.index.ConcurrentMergeScheduler
single
  

  
false
50
10

500
5000
2
false
  
  

 
  50
  18 


  solr/bin/snapshooter
  .
  true

  

  
50



true

1
1


  
 user_id 0 1 
static newSearcher warming query from
solrconfig.xml
  


  
 fast_warm 0 10 
static firstSearcher warming query from
solrconfig.xml
  

false
4
  

Replication:
The snappuller is scheduled to run every 15 mins for now. 

Hardware:
AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive

OS:
Fedora 8 (64-bit)

JVM version:
java version "1.7.0"
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)

Java options:
java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
-XX:+UseParallelGC -jar start.jar 


-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR Indexing/Querying

2007-05-31 Thread Frans Flippo

I think if you add a field that has an analyzer that creates tokens on
alpha/digit/punctuation boundaries, that should go a long way. Use that both
at index and search time.

For example:
* 3555LHP  becomes "3555" "LHP"
 Searching for D3555 becomes "D" OR "3555", so it matches on token "3555"
from 3555LHP.

* t14240 becomes "t" "14240"
 Searching for t14240-ss  becomes "t" OR "14240" OR "ss", matching "14240"
from "t14240".

Similarly for your other examples.

If this proves to be too broad, you may need to define some stricter rules,
but you could use this for starters.

I think you will have to write your own analyzer, as it doesn't look like
any of the analyzers available in Solr/Lucene do exactly what you need. But
that's relatively straightforward. Just start with the code from one of the
existing Analyzers (e.g. KeywordAnalyzer).

Good luck,
Frans

On 5/31/07, realw5 <[EMAIL PROTECTED]> wrote:



Hey Guys,
I need some guidance in regards to a problem we are having with our solr
index. Below is a list of terms our customers search for, which are
failing
or not returning the complete set. The second side of the list is the
product id/keyword we want it to match.

Can you give me some direction on how this can (or let me know if i can't
be
done) with index/query analyzers. Any help is much appeciated!

Dan

---

Keyword Typed In / We want it to find

D3555 / 3555LHP
D460160-BN / D460160
D460160BN / D460160
Dd454557 / D454557
84200ORB / 84200
84200-ORB / 84200
T13420-SCH / T13420
t14240-ss / t14240
--
View this message in context:
http://www.nabble.com/SOLR-Indexing-Querying-tf3843221.html#a10883456
Sent from the Solr - User mailing list archive at Nabble.com.




AW: SOLR Indexing/Querying

2007-05-31 Thread Burkamp, Christian
Hi there,

It looks alot like using Solr's standard "WordDelimiterFilter" (see the sample 
schema.xml) does what you need.
It splits on alphabetical to numeric boundaries and on the various kinds of 
intra word delimiters like "-", "_" or ".". You can decide whether the parts 
are put together again in addition to the split up tokens. Control this by the 
parameters "catenateWords", "catenateNumbers" and "catenateAll".
Good documentation on this topic is found on the wiki

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

-- Christian


-Ursprüngliche Nachricht-
Von: Frans Flippo [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 31. Mai 2007 11:27
An: solr-user@lucene.apache.org
Betreff: Re: SOLR Indexing/Querying


I think if you add a field that has an analyzer that creates tokens on 
alpha/digit/punctuation boundaries, that should go a long way. Use that both at 
index and search time.

For example:
* 3555LHP  becomes "3555" "LHP"
  Searching for D3555 becomes "D" OR "3555", so it matches on token "3555" from 
3555LHP.

* t14240 becomes "t" "14240"
  Searching for t14240-ss  becomes "t" OR "14240" OR "ss", matching "14240" 
from "t14240".

Similarly for your other examples.

If this proves to be too broad, you may need to define some stricter rules, but 
you could use this for starters.

I think you will have to write your own analyzer, as it doesn't look like any 
of the analyzers available in Solr/Lucene do exactly what you need. But that's 
relatively straightforward. Just start with the code from one of the existing 
Analyzers (e.g. KeywordAnalyzer).

Good luck,
Frans

On 5/31/07, realw5 <[EMAIL PROTECTED]> wrote:
>
>
> Hey Guys,
> I need some guidance in regards to a problem we are having with our 
> solr index. Below is a list of terms our customers search for, which 
> are failing or not returning the complete set. The second side of the 
> list is the product id/keyword we want it to match.
>
> Can you give me some direction on how this can (or let me know if i 
> can't be
> done) with index/query analyzers. Any help is much appeciated!
>
> Dan
>
> ---
>
> Keyword Typed In / We want it to find
>
> D3555 / 3555LHP
> D460160-BN / D460160
> D460160BN / D460160
> Dd454557 / D454557
> 84200ORB / 84200
> 84200-ORB / 84200
> T13420-SCH / T13420
> t14240-ss / t14240
> --
> View this message in context: 
> http://www.nabble.com/SOLR-Indexing-Querying-tf3843221.html#a10883456
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



Re: solr indexing exception

2011-08-26 Thread Gora Mohanty
On Fri, Aug 26, 2011 at 1:47 PM, abhijit bashetti
 wrote:
> Hi,
>
> I am using DIH for indexing 50K documents .
>
> I am using 64-bit machine with 4GB RAM

How much memory is allocated to Solr? What is the approximate size
of the data being indexed into Solr.

Regards,
Gora


Re: Solr Indexing Time

2011-11-10 Thread Steve Fatula
From: "Husain, Yavar" 
>To: "solr-user@lucene.apache.org" 
>Sent: Thursday, November 10, 2011 3:43 AM
>Subject: Solr Indexing Time
>
>However while using Solr on a VM, with 4 GB RAM it took 50 minutes to index at 
>the first time. Note that there is no Network delays and no RAM issues. Now 
>when I increased the RAM to 8GB and increased the heap size, the indexing time 
>increased to 2 hrs. That was really strange. Note that except for SQL Server 
>there is no other process running. There are no network delays. However I have 
>not checked for File I/O. Can that be a bottleneck? Does Solr has any issues 
>running in "Virtualization" Environment?
>
>
>
>I think you said it all in your statement "However I have not checked for File 
>I/O". In many VM environments, that's a huge bottleneck, depends on the 
>environment, how many VMs, etc.What sort of VM? How many on the same machine?

Re: Issue in Solr Indexing

2011-05-26 Thread Gora Mohanty
On Thu, May 26, 2011 at 7:06 PM, deepak agrawal  wrote:
> Hi All,
>
> When i am Indexing the Record into the Solr it is successfully indexing and
> after that i am committing that commit is also showing successfully.
> but when i am going to search that particular record into the solr that time
> i am not getting that record from Solr.
> I am using Solr1.4.1 version.

Please provide us with more details, as there is not much to go on here:
* How are you indexing? How are you telling that the indexing was
  successful?
* How is the field defined in the Solr schema?
* What is the commit response?

> any one can please suggest why that particular record is not indexing.I am
> not getting any error from Catalina log file also.
[...]

What does this mean? You say above that the indexing is successful,
but seem to be saying here that it was not successful after all.

Regards,
Gora


Re: Solr Indexing slows down

2010-07-30 Thread Erick Erickson
See the subject about 1500 threads. The first place I'd look is how
often you're committing. If you're committing before the warmup queries
from the previous commit have done their magic, you might be getting
into a death spiral.

HTH
Erick

On Thu, Jul 29, 2010 at 7:02 AM, Peter Karich  wrote:

> Hi,
>
> I am indexing a solr 1.4.0 core and commiting gets slower and slower.
> Starting from 3-5 seconds for ~200 documents and ending with over 60
> seconds after 800 commits. Then, if I reloaded the index, it is as fast
> as before! And today I have read a similar thread [1] and indeed: if I
> set autowarming for the caches to 0 the slowdown disappears.
>
> BUT at the same time I would like to offer searching on that core, which
> would be dramatically slowed down (due to no autowarming).
>
> Does someone know a better solution to avoid index-slow-down?
>
> Regards,
> Peter.
>
> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
>


Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
Hi Erick!

thanks for the response!
I will answer your questions ;-)

> How often are you making changes to your index?

Every 30-60 seconds. Too heavy?


> Do you have autocommit on?

No.


> Do you commit when updating each document?

No. I commit after a batch update of 200 documents


> Committing too often and consequently firing off warmup queries is the first 
> place I'd look.

Why is commiting firing warmup queries? Is there any documentation about
this subject?
How can I be sure that the previous commit has done its magic?

> there are several config values that influence the commit frequency


I now know the autowarm and the mergeFactor config. What else? Is this
documentation complete:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed ?

Regards,
Peter.

> See the subject about 1500 threads. The first place I'd look is how
> often you're committing. If you're committing before the warmup queries
> from the previous commit have done their magic, you might be getting
> into a death spiral.
>
> HTH
> Erick
>
> On Thu, Jul 29, 2010 at 7:02 AM, Peter Karich  wrote:
>
>   
>> Hi,
>>
>> I am indexing a solr 1.4.0 core and commiting gets slower and slower.
>> Starting from 3-5 seconds for ~200 documents and ending with over 60
>> seconds after 800 commits. Then, if I reloaded the index, it is as fast
>> as before! And today I have read a similar thread [1] and indeed: if I
>> set autowarming for the caches to 0 the slowdown disappears.
>>
>> BUT at the same time I would like to offer searching on that core, which
>> would be dramatically slowed down (due to no autowarming).
>>
>> Does someone know a better solution to avoid index-slow-down?
>>
>> Regards,
>> Peter.
>>
>> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
>>
>> 


Re: Solr Indexing slows down

2010-07-30 Thread Otis Gospodnetic
Peter, there are events in solrconfig where you define warm up queries when a 
new searcher is opened.

There are also cache settings that play a role here.

30-60 seconds is pretty frequent for Solr.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Peter Karich 
> To: solr-user@lucene.apache.org
> Sent: Fri, July 30, 2010 4:06:48 PM
> Subject: Re: Solr Indexing slows down
> 
> Hi Erick!
> 
> thanks for the response!
> I will answer your questions  ;-)
> 
> > How often are you making changes to your index?
> 
> Every  30-60 seconds. Too heavy?
> 
> 
> > Do you have autocommit  on?
> 
> No.
> 
> 
> > Do you commit when updating each  document?
> 
> No. I commit after a batch update of 200  documents
> 
> 
> > Committing too often and consequently firing off  warmup queries is the 
> > first 
>place I'd look.
> 
> Why is commiting firing  warmup queries? Is there any documentation about
> this subject?
> How can I  be sure that the previous commit has done its magic?
> 
> > there are  several config values that influence the commit frequency
> 
> 
> I now know  the autowarm and the mergeFactor config. What else? Is this
> documentation  complete:
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed ?
> 
> Regards,
> Peter.
> 
> > See the subject about 1500 threads. The  first place I'd look is how
> > often you're committing. If you're  committing before the warmup queries
> > from the previous commit have done  their magic, you might be getting
> > into a death spiral.
> >
> >  HTH
> > Erick
> >
> > On Thu, Jul 29, 2010 at 7:02 AM, Peter Karichwrote:
> >
> >  
> >> Hi,
> >>
> >> I am  indexing a solr 1.4.0 core and commiting gets slower and slower.
> >>  Starting from 3-5 seconds for ~200 documents and ending with over 60
> >>  seconds after 800 commits. Then, if I reloaded the index, it is as  fast
> >> as before! And today I have read a similar thread [1] and  indeed: if I
> >> set autowarming for the caches to 0 the slowdown  disappears.
> >>
> >> BUT at the same time I would like to offer  searching on that core, which
> >> would be dramatically slowed down (due  to no autowarming).
> >>
> >> Does someone know a better solution  to avoid index-slow-down?
> >>
> >> Regards,
> >>  Peter.
> >>
> >> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
> >>
> >> 
> 


Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
Hi Otis,

does it mean that a new searcher is opened after I commit?
I thought only on startup...(?)

Regards,
Peter.

> Peter, there are events in solrconfig where you define warm up queries when a 
> new searcher is opened.
>
> There are also cache settings that play a role here.
>
> 30-60 seconds is pretty frequent for Solr.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>   
>> From: Peter Karich 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, July 30, 2010 4:06:48 PM
>> Subject: Re: Solr Indexing slows down
>>
>> Hi Erick!
>>
>> thanks for the response!
>> I will answer your questions  ;-)
>>
>> 
>>> How often are you making changes to your index?
>>>   
>> Every  30-60 seconds. Too heavy?
>>
>>
>> 
>>> Do you have autocommit  on?
>>>   
>> No.
>>
>>
>> 
>>> Do you commit when updating each  document?
>>>   
>> No. I commit after a batch update of 200  documents
>>
>>
>> 
>>> Committing too often and consequently firing off  warmup queries is the 
>>> first 
>>>   
>> place I'd look.
>>
>> Why is commiting firing  warmup queries? Is there any documentation about
>> this subject?
>> How can I  be sure that the previous commit has done its magic?
>>
>> 
>>> there are  several config values that influence the commit frequency
>>>   
>>
>> I now know  the autowarm and the mergeFactor config. What else? Is this
>> documentation  complete:
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed ?
>>
>> Regards,
>> Peter.
>>
>> 
>>> See the subject about 1500 threads. The  first place I'd look is how
>>> often you're committing. If you're  committing before the warmup queries
>>> from the previous commit have done  their magic, you might be getting
>>> into a death spiral.
>>>
>>>  HTH
>>> Erick
>>>
>>> On Thu, Jul 29, 2010 at 7:02 AM, Peter Karichwrote:
>>>
>>>  
>>>   
>>>> Hi,
>>>>
>>>> I am  indexing a solr 1.4.0 core and commiting gets slower and slower.
>>>>  Starting from 3-5 seconds for ~200 documents and ending with over 60
>>>>  seconds after 800 commits. Then, if I reloaded the index, it is as  fast
>>>> as before! And today I have read a similar thread [1] and  indeed: if I
>>>> set autowarming for the caches to 0 the slowdown  disappears.
>>>>
>>>> BUT at the same time I would like to offer  searching on that core, which
>>>> would be dramatically slowed down (due  to no autowarming).
>>>>
>>>> Does someone know a better solution  to avoid index-slow-down?
>>>>
>>>> Regards,
>>>>  Peter.
>>>>
>>>> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
>>>>
>>>> 
>>>> 
>> 
>   


-- 
http://karussell.wordpress.com/



Re: Solr Indexing slows down

2010-07-30 Thread Otis Gospodnetic
As you make changes to your index, you probably want to see the new/modified 
documents in your search results.  In order to do that, the new searcher needs 
to be reopened, and this happens on commit.
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Peter Karich 
> To: solr-user@lucene.apache.org
> Sent: Fri, July 30, 2010 6:19:03 PM
> Subject: Re: Solr Indexing slows down
> 
> Hi Otis,
> 
> does it mean that a new searcher is opened after I commit?
> I  thought only on startup...(?)
> 
> Regards,
> Peter.
> 
> > Peter, there  are events in solrconfig where you define warm up queries 
> > when 
>a 
>
> > new  searcher is opened.
> >
> > There are also cache settings that play a  role here.
> >
> > 30-60 seconds is pretty frequent for  Solr.
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> >  
> >> From: Peter Karich 
> >> To: solr-user@lucene.apache.org
> >>  Sent: Fri, July 30, 2010 4:06:48 PM
> >> Subject: Re: Solr Indexing  slows down
> >>
> >> Hi Erick!
> >>
> >> thanks for  the response!
> >> I will answer your questions   ;-)
> >>
> >>
> >>> How often are you  making changes to your index?
> >>>  
> >>  Every  30-60 seconds. Too heavy?
> >>
> >>
> >> 
> >>> Do you have autocommit  on?
> >>>   
> >> No.
> >>
> >>
> >>
> >>> Do you commit when updating each   document?
> >>>  
> >> No. I commit after a  batch update of 200  documents
> >>
> >>
> >> 
> >>> Committing too often and consequently firing off   warmup queries is the 
>first 
>
> >>>  
> >>  place I'd look.
> >>
> >> Why is commiting firing  warmup  queries? Is there any documentation about
> >> this subject?
> >>  How can I  be sure that the previous commit has done its  magic?
> >>
> >>
> >>> there are   several config values that influence the commit frequency
> >>>   
> >>
> >> I now know  the autowarm and the  mergeFactor config. What else? Is this
> >> documentation   complete:
> >> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed ?
> >>
> >> Regards,
> >>  Peter.
> >>
> >>
> >>> See the subject  about 1500 threads. The  first place I'd look is how
> >>> often  you're committing. If you're  committing before the warmup  queries
> >>> from the previous commit have done  their magic,  you might be getting
> >>> into a death  spiral.
> >>>
> >>>  HTH
> >>>  Erick
> >>>
> >>> On Thu, Jul 29, 2010 at 7:02 AM, Peter  Karich 
wrote:
> >>>
> >>>  
> >>>   
> >>>> Hi,
> >>>>
> >>>> I  am  indexing a solr 1.4.0 core and commiting gets slower and  slower.
> >>>>  Starting from 3-5 seconds for ~200 documents  and ending with over 60
> >>>>  seconds after 800 commits.  Then, if I reloaded the index, it is as  
fast
> >>>> as  before! And today I have read a similar thread [1] and  indeed: if  I
> >>>> set autowarming for the caches to 0 the slowdown   disappears.
> >>>>
> >>>> BUT at the same time I would  like to offer  searching on that core, 
which
> >>>> would be  dramatically slowed down (due  to no  autowarming).
> >>>>
> >>>> Does someone know a better  solution  to avoid index-slow-down?
> >>>>
> >>>>  Regards,
> >>>>   Peter.
> >>>>
> >>>> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
> >>>>
> >>>> 
> >>>>
> >> 
> >  
> 
> 
> -- 
> http://karussell.wordpress.com/
> 
> 


Re: Solr Indexing slows down

2010-08-02 Thread Peter Karich
Thanks Otis, for this clarification!

That means I will have to descrease the commit frequency to speed up
indexing.
How could I do this if I don't want to introduce an artificial delay
time? ... via increasing the batch size?

Today I have read in another thread[1] that one should univert the
field? What is it and how can I do this?

Regards,
Peter.

[1]
http://www.mail-archive.com/solr-user@lucene.apache.org/msg36113.html


> As you make changes to your index, you probably want to see the new/modified 
> documents in your search results.  In order to do that, the new searcher 
> needs 
> to be reopened, and this happens on commit.
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>   
>> From: Peter Karich 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, July 30, 2010 6:19:03 PM
>> Subject: Re: Solr Indexing slows down
>>
>> Hi Otis,
>>
>> does it mean that a new searcher is opened after I commit?
>> I  thought only on startup...(?)
>>
>> Regards,
>> Peter.
>>
>> 
>>> Peter, there  are events in solrconfig where you define warm up queries 
>>> when 
>>>   
>> a 
>>
>> 
>>> new  searcher is opened.
>>>
>>> There are also cache settings that play a  role here.
>>>
>>> 30-60 seconds is pretty frequent for  Solr.
>>>
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>>
>>>
>>>
>>> - Original  Message 
>>>  
>>>   
>>>> From: Peter Karich 
>>>> To: solr-user@lucene.apache.org
>>>>  Sent: Fri, July 30, 2010 4:06:48 PM
>>>> Subject: Re: Solr Indexing  slows down
>>>>
>>>> Hi Erick!
>>>>
>>>> thanks for  the response!
>>>> I will answer your questions   ;-)
>>>>
>>>>
>>>> 
>>>>> How often are you  making changes to your index?
>>>>>  
>>>>>   
>>>>  Every  30-60 seconds. Too heavy?
>>>>
>>>>
>>>> 
>>>> 
>>>>> Do you have autocommit  on?
>>>>>   
>>>>>   
>>>> No.
>>>>
>>>>
>>>>
>>>> 
>>>>> Do you commit when updating each   document?
>>>>>  
>>>>>   
>>>> No. I commit after a  batch update of 200  documents
>>>>
>>>>
>>>> 
>>>> 
>>>>> Committing too often and consequently firing off   warmup queries is the 
>>>>>   
>> first 
>>
>> 
>>>>>  
>>>>>   
>>>>  place I'd look.
>>>>
>>>> Why is commiting firing  warmup  queries? Is there any documentation about
>>>> this subject?
>>>>  How can I  be sure that the previous commit has done its  magic?
>>>>
>>>>
>>>> 
>>>>> there are   several config values that influence the commit frequency
>>>>>   
>>>>>   
>>>> I now know  the autowarm and the  mergeFactor config. What else? Is this
>>>> documentation   complete:
>>>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed ?
>>>>
>>>> Regards,
>>>>  Peter.
>>>>
>>>>
>>>> 
>>>>> See the subject  about 1500 threads. The  first place I'd look is how
>>>>> often  you're committing. If you're  committing before the warmup  queries
>>>>> from the previous commit have done  their magic,  you might be getting
>>>>> into a death  spiral.
>>>>>
>>>>>  HTH
>>>>>  Erick
>>>>>
>>>>> On Thu, Jul 29, 2010 at 7:02 AM, Peter  Karich 
>>>>>   
> wrote:
>   
>>>>>  
>>>>>   
>>>>>   
>>>>>> Hi,
>>>>>>
>>>>>> I  am  indexing a solr 1.4.0 core and commiting gets slower and  slower.
>>>>>>  Starting from 3-5 seconds for ~200 documents  and ending with over 60
>>>>>>  seconds after 800 commits.  Then, if I reloaded the index, it is as  
>>>>>> 
> fast
>   
>>>>>> as  before! And today I have read a similar thread [1] and  indeed: if  I
>>>>>> set autowarming for the caches to 0 the slowdown   disappears.
>>>>>>
>>>>>> BUT at the same time I would  like to offer  searching on that core, 
>>>>>> 
> which
>   
>>>>>> would be  dramatically slowed down (due  to no  autowarming).
>>>>>>
>>>>>> Does someone know a better  solution  to avoid index-slow-down?
>>>>>>
>>>>>>  Regards,
>>>>>>   Peter.
>>>>>>
>>>>>> [1] http://www.mail-archive.com/solr-user@lucene.apache.org/msg20785.html
>>>>>>
>>>>>>
>>>>>> 



Re: Speeding up solr indexing

2010-10-08 Thread Erick Erickson
Well, 10million rows is a bunch of rows, it'll take some time. But you
haven't
given us any clue what that means. Is it taking 5 minutes? 5 hours? 5 days?
Without some dimension on the problem it's really hard to provide any
suggestions,
you might be seeing entirely reasonable times, we just don't know.

MySql (depending on version) likes to load the entire results set into
memory, which
may be related. See:
http://search.lucidimagination.com/search/out?u=http://wiki.apache.org/solr/DataImportHandlerFaq%23head-149779b72761ab071c841879545256bdbbdc15d2

<http://search.lucidimagination.com/search/out?u=http://wiki.apache.org/solr/DataImportHandlerFaq%23head-149779b72761ab071c841879545256bdbbdc15d2>
HTH
Erick

On Fri, Oct 8, 2010 at 2:59 PM, sivaprasad wrote:

>
> Hi,
> I am indexing the data using DIH.Data coming from mysql.Each document
> contains 30 fields.Some of the fields are multi valued.When i am trying to
> index 10 million records it taking more time to index.
>
> Any body has suggestions to speed up indexing process?Any suggestions on
> solr admin level configurations?
>
>
> Thanks,
> JS
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Speeding up solr indexing

2010-10-08 Thread Otis Gospodnetic
Hi,

Assuming your DB/network/something else is not the bottleneck, increase your 
ramBufferSizeMB (in solrconfig).

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: sivaprasad 
> To: solr-user@lucene.apache.org
> Sent: Fri, October 8, 2010 2:59:45 PM
> Subject: Speeding up solr indexing
> 
> 
> Hi,
> I am indexing the data using DIH.Data coming from mysql.Each  document
> contains 30 fields.Some of the fields are multi valued.When i am  trying to
> index 10 million records it taking more time to index.
> 
> Any  body has suggestions to speed up indexing process?Any suggestions on
> solr  admin level configurations?
> 
> 
> Thanks,
> JS
> -- 
> View this message  in context: 
>http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Re: Speeding up solr indexing

2010-10-08 Thread Dennis Gearon
How does that have to work with Java's memory? 

In lockstep, a certain percentage, not related, what, or at all?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/8/10, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Friday, October 8, 2010, 9:13 PM
> Hi,
> 
> Assuming your DB/network/something else is not the
> bottleneck, increase your 
> ramBufferSizeMB (in solrconfig).
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> > From: sivaprasad 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, October 8, 2010 2:59:45 PM
> > Subject: Speeding up solr indexing
> > 
> > 
> > Hi,
> > I am indexing the data using DIH.Data coming from
> mysql.Each  document
> > contains 30 fields.Some of the fields are multi
> valued.When i am  trying to
> > index 10 million records it taking more time to
> index.
> > 
> > Any  body has suggestions to speed up indexing
> process?Any suggestions on
> > solr  admin level configurations?
> > 
> > 
> > Thanks,
> > JS
> > -- 
> > View this message  in context: 
> >http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
> >
> > Sent  from the Solr - User mailing list archive
> at Nabble.com.
> > 
>


Re: Speeding up solr indexing

2010-10-09 Thread Otis Gospodnetic
Related.  Can't be larger than -Xmx. :)  Or even equal to -Xmx, because other 
things need to live in the heap.  There is no exact function, so be more on the 
conservative side in order to avoid OOME.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dennis Gearon 
> To: solr-user@lucene.apache.org
> Sent: Sat, October 9, 2010 12:58:18 AM
> Subject: Re: Speeding up solr indexing
> 
> How does that have to work with Java's memory? 
> 
> In lockstep, a certain  percentage, not related, what, or at all?
> 
> 
> Dennis  Gearon
> 
> Signature Warning
> 
> It is always a good idea  to learn from your own mistakes. It is usually a 
>better idea to learn from  others’ mistakes, so you do not have to make them 
>yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH  has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Fri,  10/8/10, Otis Gospodnetic   wrote:
> 
> > From: Otis Gospodnetic 
> >  Subject: Re: Speeding up solr indexing
> > To: solr-user@lucene.apache.org
> >  Date: Friday, October 8, 2010, 9:13 PM
> > Hi,
> > 
> > Assuming  your DB/network/something else is not the
> > bottleneck, increase your 
> > ramBufferSizeMB (in solrconfig).
> > 
> > Otis
> >  
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original  Message 
> > > From: sivaprasad 
> >  > To: solr-user@lucene.apache.org
> >  > Sent: Fri, October 8, 2010 2:59:45 PM
> > > Subject: Speeding up  solr indexing
> > > 
> > > 
> > > Hi,
> > > I am  indexing the data using DIH.Data coming from
> > mysql.Each   document
> > > contains 30 fields.Some of the fields are multi
> >  valued.When i am  trying to
> > > index 10 million records it taking more  time to
> > index.
> > > 
> > > Any  body has suggestions to  speed up indexing
> > process?Any suggestions on
> > > solr  admin  level configurations?
> > > 
> > > 
> > > Thanks,
> >  > JS
> > > -- 
> > > View this message  in context: 
> >  
>>http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
>
> >  >
> > > Sent  from the Solr - User mailing list archive
> > at  Nabble.com.
> > > 
> >
>


Re: Speeding up solr indexing

2010-10-09 Thread sivaprasad

Hi,
Please find the configurations below.

Machine configurations(Solr running here):

RAM - 4 GB
HardDisk - 180GB
Os - Red Hat linux version 5
Processor-2x Intel Core 2 Duo CPU @2.66GHz



Machine configurations(Mysql server is running here):
RAM - 4 GB
HardDisk - 180GB
Os - Red Hat linux version 5
Processor-2x Intel Core 2 Duo CPU @2.66GHz

My sql Server deatils:
My sql version - Mysql 5.0.22

Solr configuration details:

 
  
false

20
   

100
2147483647
1
1000
1

   
   


   

single
  

  

false
100
20
   

2147483647
1
false
  

  
  
10
 
  1 
  6




  

Solr document details:

21 fields are indexed and stored
3 fileds are indexed only.
3 fileds are stored only.
3 fileds are indexed,stored and multi valued
2 fileds indexed and multi valued

And i am copying some of the indexed fileds.In this 2 fileds are multivalued
and has thousands of values.

In db-config-file the main table contains 0.6 million records.

When i tested for the same records, the index has taken 1hr 30 min.In this
case one of the multivalued filed table doesn't have records.After putting
data into this table,for each main table record , this table has thousands
of records and this filed is indexed and stored.It is taking more than 24
hrs .

Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr 1.4.1

I am using JVM's default settings.

Why this is taking this much time?Any body has suggestions, where i am going
wrong.

Thanks,
JS
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Speeding up solr indexing

2010-10-09 Thread Dennis Gearon
Looking at it, and now knowing how much memory your other processes on your box 
use (nor how much memory you have set aside for Java), I would start with 
DOUBLING your ram. Make sure that you have enough Java memory.

You will know if it has some effect by using the 2:1 size ratio. 100mb for all 
that data ia pretty small, I think.


Use the scientific method; Change only one parameter at a time and check 
results.

It's always on of four things:
(in different order depending on task, but listed alphabetically here)
--
Memory (process assigned and/or actual physical memory)
Processor
Network Bandwidth
Hard Drive Bandwidth
(sometimes you can add motherboard I/O paths also.
 as of this date, AMD has much more I/O paths in their
 consumer line of processors.)

In order ease of experimenting with(Easiest to hardest):
---
Appication/process assigned memory
Physical memory
Network Bandwidth
HardDrive Bandwidth
  Screaming fast SCSI 15K rpm drives
  RAID arrays, casual
  RAID arrays, professional
  External DRAM drive 64 gig max/RAID them for more
Processor(s) 
  Put maximum speed/cache size motherboard will take.
  Otherwise, USUALLY requires changing motherboard/HOSTING setup
I/O channels
  USUALLY requires changing motherboard/HOSTING setup





Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/9/10, sivaprasad  wrote:

> From: sivaprasad 
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Saturday, October 9, 2010, 8:09 AM
> 
> Hi,
> Please find the configurations below.
> 
> Machine configurations(Solr running here):
> 
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> 
> 
> Machine configurations(Mysql server is running here):
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> My sql Server deatils:
> My sql version - Mysql 5.0.22
> 
> Solr configuration details:
> 
>  
>   
>    
> false
> 
>     20
>    
>    
>  
>   
>    
> 100
>    
> 2147483647
>    
> 1
>    
> 1000
>    
> 1
>    
> 
>    
>    
> 
>     
>    
> 
>     single
>   
> 
>   
>     
>    
> false
>    
> 100
>     20
>    
>    
> 
>    
> 2147483647
>    
> 1
>    
> false
>   
> 
>   
>    class="solr.DirectUpdateHandler2">
>    
> 10
>      
>       1 
>       6
>     
>     
>     
>     
>   
> 
> Solr document details:
> 
> 21 fields are indexed and stored
> 3 fileds are indexed only.
> 3 fileds are stored only.
> 3 fileds are indexed,stored and multi valued
> 2 fileds indexed and multi valued
> 
> And i am copying some of the indexed fileds.In this 2
> fileds are multivalued
> and has thousands of values.
> 
> In db-config-file the main table contains 0.6 million
> records.
> 
> When i tested for the same records, the index has taken 1hr
> 30 min.In this
> case one of the multivalued filed table doesn't have
> records.After putting
> data into this table,for each main table record , this
> table has thousands
> of records and this filed is indexed and stored.It is
> taking more than 24
> hrs .
> 
> Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr
> 1.4.1
> 
> I am using JVM's default settings.
> 
> Why this is taking this much time?Any body has suggestions,
> where i am going
> wrong.
> 
> Thanks,
> JS
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>


RE: Speeding up solr indexing

2010-10-10 Thread Ephraim Ofir
Try running the query you're using in DIH from command line on the DB host and 
on the solr host to see what kind of times you get from the DB itself and from 
the network, you're bottleneck might be there.  If you find that's not it, take 
a look at this post regarding high performance DIH imports, you can get serious 
improvement in performance by not using sub-entities 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
 

Ephraim Ofir

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Saturday, October 09, 2010 10:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Speeding up solr indexing

Looking at it, and now knowing how much memory your other processes on your box 
use (nor how much memory you have set aside for Java), I would start with 
DOUBLING your ram. Make sure that you have enough Java memory.

You will know if it has some effect by using the 2:1 size ratio. 100mb for all 
that data ia pretty small, I think.


Use the scientific method; Change only one parameter at a time and check 
results.

It's always on of four things:
(in different order depending on task, but listed alphabetically here)
--
Memory (process assigned and/or actual physical memory)
Processor
Network Bandwidth
Hard Drive Bandwidth
(sometimes you can add motherboard I/O paths also.
 as of this date, AMD has much more I/O paths in their
 consumer line of processors.)

In order ease of experimenting with(Easiest to hardest):
---
Appication/process assigned memory
Physical memory
Network Bandwidth
HardDrive Bandwidth
  Screaming fast SCSI 15K rpm drives
  RAID arrays, casual
  RAID arrays, professional
  External DRAM drive 64 gig max/RAID them for more
Processor(s) 
  Put maximum speed/cache size motherboard will take.
  Otherwise, USUALLY requires changing motherboard/HOSTING setup
I/O channels
  USUALLY requires changing motherboard/HOSTING setup





Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/9/10, sivaprasad  wrote:

> From: sivaprasad 
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Saturday, October 9, 2010, 8:09 AM
> 
> Hi,
> Please find the configurations below.
> 
> Machine configurations(Solr running here):
> 
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> 
> 
> Machine configurations(Mysql server is running here):
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> My sql Server deatils:
> My sql version - Mysql 5.0.22
> 
> Solr configuration details:
> 
>  
>   
>    
> false
> 
>     20
>    
>    
>  
>   
>    
> 100
>    
> 2147483647
>    
> 1
>    
> 1000
>    
> 1
>    
> 
>    
>    
> 
>     
>    
> 
>     single
>   
> 
>   
>     
>    
> false
>    
> 100
>     20
>    
>    
> 
>    
> 2147483647
>    
> 1
>    
> false
>   
> 
>   
>    class="solr.DirectUpdateHandler2">
>    
> 10
>      
>       1 
>       6
>     
>     
>     
>     
>   
> 
> Solr document details:
> 
> 21 fields are indexed and stored
> 3 fileds are indexed only.
> 3 fileds are stored only.
> 3 fileds are indexed,stored and multi valued
> 2 fileds indexed and multi valued
> 
> And i am copying some of the indexed fileds.In this 2
> fileds are multivalued
> and has thousands of values.
> 
> In db-config-file the main table contains 0.6 million
> records.
> 
> When i tested for the same records, the index has taken 1hr
> 30 min.In this
> case one of the multivalued filed table doesn't have
> records.After putting
> data into this table,for each main table record , this
> table has thousands
> of records and this filed is indexed and stored.It is
> taking more than 24
> hrs .
> 
> Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr
> 1.4.1
> 
> I am using JVM's default settings.
> 
> Why this is taking this much time?Any body has suggestions,
> where i am going
> wrong.
> 
> Thanks,
> JS
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


Solr indexing socket timeout errors

2011-01-07 Thread Burton-West, Tom
Hello all,

We are getting intermittent socket timeout errors (see below).  Out of about 
600,000 indexing requests, 30 returned these socket timeout errors.  We haven't 
been able to correlate these with large merges, which tends to slow down the 
indexing response rate.

Does anyone know where we might look to determine the cause?

Tom

Tom Burton-West

Jan 7, 2011 2:31:07 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: [was class java.net.SocketTimeoutException] 
Read timed out
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1354)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:777)
at 
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:807)
at 
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:116)
at 
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:742)
at org.apache.coyote.Request.doRead(Request.java:419)
at 
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:270)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:403)
  at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:293)
at 
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:193)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at 
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at 
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at 
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at 
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 24 more




Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi All,

I have a problem using SOLR indexing. I am trying to index 96 pages PDF file
(using PDFBox for extracting the file contents into String). But
surprisingly SOLR Indexing is not done for the full document. Means I can't
get all the token how ever the field contains the full text of the PDF as i
am storing the field along with indexing.

Is there any such limitations with SOLR indexing, please let me know at the
earliest.

Thanks in advance!

Best Regards,
Kranti K K Parisa


Re: Plurals in solr indexing

2010-01-27 Thread murali k

I have found that my synonyms.txt file had

kids,boys,girls,childrens,children,boys & girls,kid,boy,girl

I ran analyzer, somehow it is matching with girl ,, i am not sure whats
happening yet, so i removed ampersand
Kids,boys,girls,childrens,children,boy,girl,kid

I guessed when i add them comma separated it will do as a group and when any
one the words are queried matches will be returned.

it is working now... after i made that change in synonyms.txt file






murali k wrote:
> 
> Hi, 
> I am having trouble with indexing plurals, 
> 
> I have the schema with following fields
> gender (field) - string (field type) (eg. data Boys)
> all (field) - text (field type)  - solr.WhitespaceTokenizerFactory,
> solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory,
> solr.LowerCaseFilterFactory, SnowballPorterFilterFactory
> 
> i am using copyField from gender to all
> 
> and searching on "all" field
> 
> When i search for Boy, I get the results, If i search for Boys i dont get
> results, 
> I have tried things like "boys bikes" - no results
> "boy bikes" - works
> 
> kid and kids are synonymns for boy and boys, so i tried adding 
> kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way
> 
> I also have other content fields which are copied to "all" , and it
> contains words like "kids, boys" etc...
> any idea?
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27336508.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Plurals in solr indexing

2010-01-27 Thread Erick Erickson
It would be more informative for you to actually post your
schema definitions for the fields in question, along
with your copyfield. The summary in your first
post leaves a lot of questions unanswered...

But a couple of things.
1> beware the SOLR string type. It does NOT tokenize
 the input. Text type is usually what people want
 unless they are doing something special
 purpose.
2> WordDelimiterFilterFactory is often a source of
 misunderstanding, take a close look at
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>3> I'd strongly
advise either really getting to know the admin
 page in SOLR and/or getting a copy of Luke to examine
 your index and see if what you *think* is in there actually is.
4> Try running your queries with debugQuery=on and see
 what that shows.


HTH
Erick

On Wed, Jan 27, 2010 at 6:09 AM, murali k  wrote:

>
> I have found that my synonyms.txt file had
>
> kids,boys,girls,childrens,children,boys & girls,kid,boy,girl
>
> I ran analyzer, somehow it is matching with girl ,, i am not sure whats
> happening yet, so i removed ampersand
> Kids,boys,girls,childrens,children,boy,girl,kid
>
> I guessed when i add them comma separated it will do as a group and when
> any
> one the words are queried matches will be returned.
>
> it is working now... after i made that change in synonyms.txt file
>
>
>
>
>
>
> murali k wrote:
> >
> > Hi,
> > I am having trouble with indexing plurals,
> >
> > I have the schema with following fields
> > gender (field) - string (field type) (eg. data Boys)
> > all (field) - text (field type)  - solr.WhitespaceTokenizerFactory,
> > solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory,
> > solr.LowerCaseFilterFactory, SnowballPorterFilterFactory
> >
> > i am using copyField from gender to all
> >
> > and searching on "all" field
> >
> > When i search for Boy, I get the results, If i search for Boys i dont get
> > results,
> > I have tried things like "boys bikes" - no results
> > "boy bikes" - works
> >
> > kid and kids are synonymns for boy and boys, so i tried adding
> > kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that
> way
> >
> > I also have other content fields which are copied to "all" , and it
> > contains words like "kids, boys" etc...
> > any idea?
> >
> >
> >
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27336508.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Plurals in solr indexing

2010-01-27 Thread Tom Hill
I recommend getting familiar with the analysis tool included with solr. From
Solr's main admin screen, click on "analysis", Check verbose, and enter your
text, and you can see the changes that happen during analysis.

It's really helpful, especially when getting started.

Tom


On Wed, Jan 27, 2010 at 2:41 AM, murali k  wrote:

>
> Hi,
> I am having trouble with indexing plurals,
>
> I have the schema with following fields
> gender (field) - string (field type) (eg. data Boys)
> all (field) - text (field type)  - solr.WhitespaceTokenizerFactory,
> solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory,
> solr.LowerCaseFilterFactory, SnowballPorterFilterFactory
>
> i am using copyField from gender to all
>
> and searching on "all" field
>
> When i search for Boy, I get the results, If i search for Boys i dont get
> results,
> I have tried things like "boys bikes" - no results
> "boy bikes" - works
>
> kid and kids are synonymns for boy and boys, so i tried adding
> kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way
>
> I also have other content fields which are copied to "all" , and it
> contains
> words like "kids, boys" etc...
> any idea?
>
>
>
>
>
> --
> View this message in context:
> http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr indexing configuration help

2008-05-28 Thread Yonik Seeley
t;
>  size="0"
>  initialSize="0"
>  autowarmCount="0"/>
>  class="solr.LRUCache"
>  size="0"
>  initialSize="0"
>  autowarmCount="0"/>
>true
>
>1
>1
>
>
>  
> user_id 0  name="rows">1 
>static newSearcher warming query from
> solrconfig.xml
>  
>
>
>  
> fast_warm 0  name="rows">10 
>static firstSearcher warming query from
> solrconfig.xml
>  
>
>false
>4
>  
>
> Replication:
>The snappuller is scheduled to run every 15 mins for now.
>
> Hardware:
>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>
> OS:
>Fedora 8 (64-bit)
>
> JVM version:
>java version "1.7.0"
> IcedTea Runtime Environment (build 1.7.0-b21)
> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>
> Java options:
>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
> -XX:+UseParallelGC -jar start.jar
>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr indexing configuration help

2008-05-28 Thread Gaku Mak
orms="true"/>
>>
>> id
>>
>> Solrconfig.xml
>>  
>>false
>>10
>>500
>>50
>>5000
>>2
>>1000
>>1
>>
>> org.apache.lucene.index.LogByteSizeMergePolicy
>> org.apache.lucene.index.ConcurrentMergeScheduler
>>single
>>  
>>
>>  
>>false
>>50
>>10
>>
>>500
>>5000
>>2
>>false
>>  
>>  
>>
>>
>>  50
>>  18
>>
>>
>>  solr/bin/snapshooter
>>  .
>>  true
>>
>>  
>>
>>  
>>50
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>  autowarmCount="0"/>
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>  autowarmCount="0"/>
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>  autowarmCount="0"/>
>>true
>>
>>1
>>1
>>
>>
>>  
>> user_id 0 > name="rows">1 
>>static newSearcher warming query from
>> solrconfig.xml
>>  
>>
>>
>>  
>> fast_warm 0 > name="rows">10 
>>static firstSearcher warming query from
>> solrconfig.xml
>>  
>>
>>false
>>4
>>  
>>
>> Replication:
>>The snappuller is scheduled to run every 15 mins for now.
>>
>> Hardware:
>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>
>> OS:
>>Fedora 8 (64-bit)
>>
>> JVM version:
>>java version "1.7.0"
>> IcedTea Runtime Environment (build 1.7.0-b21)
>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>
>> Java options:
>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>> -XX:+UseParallelGC -jar start.jar
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-05-28 Thread Yonik Seeley
>
>>> Solr version (solr nightly build on 5/23/2008)
>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>Solr Implementation Version: nightly
>>>Lucene Specification Version: 2.3.2
>>>Lucene Implementation Version: 2.3.2 652650
>>>Jetty: 6.1.3
>>>
>>> Schema.xml (the section that I think are relevant to the master server.)
>>>
>>>>> omitNorms="true"/>
>>>>> positionIncrementGap="100">
>>>  
>>>
>>>  
>>>
>>>
>>> >> required="true"
>>> />
>>> >> multiValued="true" omitNorms="true"/>
>>>>> stored="false"
>>> omitNorms="true"/>
>>>
>>> id
>>>
>>> Solrconfig.xml
>>>  
>>>false
>>>10
>>>500
>>>50
>>>5000
>>>2
>>>1000
>>>1
>>>
>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>single
>>>  
>>>
>>>  
>>>false
>>>50
>>>10
>>>
>>>500
>>>5000
>>>2
>>>false
>>>  
>>>  
>>>
>>>
>>>  50
>>>  18
>>>
>>>
>>>  solr/bin/snapshooter
>>>  .
>>>      true
>>>
>>>  
>>>
>>>  
>>>50
>>>>>  class="solr.LRUCache"
>>>  size="0"
>>>  initialSize="0"
>>>  autowarmCount="0"/>
>>>>>  class="solr.LRUCache"
>>>  size="0"
>>>  initialSize="0"
>>>  autowarmCount="0"/>
>>>>>  class="solr.LRUCache"
>>>  size="0"
>>>  initialSize="0"
>>>  autowarmCount="0"/>
>>>true
>>>
>>>1
>>>1
>>>
>>>
>>>  
>>> user_id 0 >> name="rows">1 
>>>static newSearcher warming query from
>>> solrconfig.xml
>>>  
>>>
>>>
>>>  
>>> fast_warm 0 >> name="rows">10 
>>>static firstSearcher warming query from
>>> solrconfig.xml
>>>  
>>>
>>>false
>>>4
>>>  
>>>
>>> Replication:
>>>The snappuller is scheduled to run every 15 mins for now.
>>>
>>> Hardware:
>>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>
>>> OS:
>>>Fedora 8 (64-bit)
>>>
>>> JVM version:
>>>java version "1.7.0"
>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>
>>> Java options:
>>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>> -XX:+UseParallelGC -jar start.jar
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr indexing configuration help

2008-05-28 Thread Otis Gospodnetic
Gaku,

But what's this then:

>> JVM version:
>>java version "1.7.0"
>> IcedTea Runtime Environment (build 1.7.0-b21)
>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)


Get the JVM from Sun.  Also, why do you have autoCommit on if all you are 
testing is indexing?  I'd turn that off.  The Java process going away sounds 
bad and smells like a Java/JVM problem more than Solr problem.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Gaku Mak <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 28, 2008 10:30:39 PM
> Subject: Re: Solr indexing configuration help
> 
> 
> I used the admin GUI to get the java info.
> java.vm.specification.vendor = Sun Microsystems Inc.
> 
> Any suggestion?  Thanks a lot for your help!!
> 
> -Gaku
> 
> 
> Yonik Seeley wrote:
> > 
> > Not sure why you would be getting an OOM from just indexing, and with
> > the 1.5G heap you've given the JVM.
> > Have you tried Sun's JVM?
> > 
> > -Yonik
> > 
> > On Wed, May 28, 2008 at 7:35 PM, gaku113 wrote:
> >>
> >> Hi all Solr users/developers/experts,
> >>
> >> I have the following scenario and I appreciate any advice for tuning my
> >> solr
> >> master server.
> >>
> >> I have a field in my schema that would index (but not stored) about
> >> ~1
> >> ids for each document.  This field is expected to govern the size of the
> >> document.  Each id can contain up to 6 characters.  I figure that there
> >> are
> >> two alternatives for this field, one is the use a string multi-valued
> >> field,
> >> and the other would be to pass a white-space-delimited string to solr and
> >> have solr tokenize such string based on whitespace (the text_ws
> >> fieldType).
> >> The master server is expected to receive constant stream of updates.
> >>
> >> The expected/estimated document size can range from 50k to 100k for a
> >> single
> >> document.  (I know this is quite large). The number of documents is
> >> expected
> >> to be around 200,000 on each master server, and there can be multiple
> >> master
> >> servers (sharding).  I wish the master can handle more docs too if I can
> >> figure a way out.
> >>
> >> Currently, I'm performing some basic stress tests to simulate the
> >> indexing
> >> side on the master server.  This stress test would continuously add new
> >> documents at the rate of about 10 documents every 30 seconds.  Autocommit
> >> is
> >> being used (50 docs and 180 seconds constraints), but I have no idea if
> >> this
> >> is the preferred way.  The goal is to keep adding new documents until we
> >> can
> >> get at least 200,000 documents (or about 20GB of index) on the master (or
> >> even more if the server can handle it)
> >>
> >> What I experienced from the indexing stress test is that the master
> >> server
> >> failed to respond after a while, such as non-pingable when there are
> >> about
> >> 30k documents.  When looking at the log, they are mostly:
> >> java.lang.OutOfMemoryError: Java heap space
> >> OR
> >> Ping query caused exception: null (this is probably caused by the OOM
> >> problem)
> >>
> >> There were also a few cases that the java process even went away.
> >>
> >> Questions:
> >> 1)  Is it better to use the multi-valued string field or the text_ws
> >> field
> >> for this large field?
> >> 2)  Is it better to have more outstanding docs per commit or more
> >> frequent
> >> commit, in term of maximizing server resources?  What is the preferred
> >> way
> >> to commit documents assuming that solr master receives updates
> >> frequently?
> >> How many updated docs should there be before issuing a commit?
> >> 3)  How to avoid the OOM problem in my case? I'm already doing
> >> (-Xms1536M
> >> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
> >> adding
> >> more Ram would just delay the OOM problem.  Any additional JVM option to
> >> consider?
> >> 4)  Any recommendation for the master server configuration, in a
> >> sense that I
> >> can maximize the number of indexed docs?
> >> 5)  How can it disable caching on the master alto

Re: Solr indexing configuration help

2008-05-28 Thread Gaku Mak
blem.  Any additional JVM option
>>>> to
>>>> consider?
>>>> 4)  Any recommendation for the master server configuration, in a
>>>> sense that I
>>>> can maximize the number of indexed docs?
>>>> 5)  How can it disable caching on the master altogether as queries
>>>> won't hit
>>>> the master?
>>>> 6)  For an average doc size of 50k-100k, is that too large for
>>>> solr,
>>>> or even
>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>> reduce
>>>> the size of docs, can we expect to index more documents?
>>>>
>>>> The followings are info related to software/hardware/configuration:
>>>>
>>>> Solr version (solr nightly build on 5/23/2008)
>>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>Solr Implementation Version: nightly
>>>>Lucene Specification Version: 2.3.2
>>>>Lucene Implementation Version: 2.3.2 652650
>>>>Jetty: 6.1.3
>>>>
>>>> Schema.xml (the section that I think are relevant to the master
>>>> server.)
>>>>
>>>>>>> sortMissingLast="true"
>>>> omitNorms="true"/>
>>>>>>> positionIncrementGap="100">
>>>>  
>>>>
>>>>  
>>>>
>>>>
>>>> >>> required="true"
>>>> />
>>>> >>> multiValued="true" omitNorms="true"/>
>>>>>>> stored="false"
>>>> omitNorms="true"/>
>>>>
>>>> id
>>>>
>>>> Solrconfig.xml
>>>>  
>>>>false
>>>>10
>>>>    500
>>>>50
>>>>5000
>>>>2
>>>>1000
>>>>1
>>>>
>>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>>single
>>>>  
>>>>
>>>>  
>>>>false
>>>>    50
>>>>10
>>>>
>>>>500
>>>>5000
>>>>2
>>>>false
>>>>  
>>>>  
>>>>
>>>>
>>>>  50
>>>>  18
>>>>
>>>>
>>>>  solr/bin/snapshooter
>>>>  .
>>>>  true
>>>>
>>>>  
>>>>
>>>>  
>>>>50
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>true
>>>>
>>>>1
>>>>1
>>>>
>>>>
>>>>  
>>>> user_id 0 >>> name="rows">1 
>>>>static newSearcher warming query from
>>>> solrconfig.xml
>>>>  
>>>>
>>>>
>>>>  
>>>> fast_warm 0
>>>> >>> name="rows">10 
>>>>static firstSearcher warming query from
>>>> solrconfig.xml
>>>>  
>>>>
>>>>false
>>>>4
>>>>  
>>>>
>>>> Replication:
>>>>The snappuller is scheduled to run every 15 mins for now.
>>>>
>>>> Hardware:
>>>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>
>>>> OS:
>>>>Fedora 8 (64-bit)
>>>>
>>>> JVM version:
>>>>java version "1.7.0"
>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>
>>>> Java options:
>>>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>> -XX:+UseParallelGC -jar start.jar
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17527555.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-05-29 Thread Gaku Mak
s the preferred
>>>> way
>>>> to commit documents assuming that solr master receives updates
>>>> frequently?
>>>> How many updated docs should there be before issuing a commit?
>>>> 3)  How to avoid the OOM problem in my case? I'm already doing
>>>> (-Xms1536M
>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>> adding
>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>> to
>>>> consider?
>>>> 4)  Any recommendation for the master server configuration, in a
>>>> sense that I
>>>> can maximize the number of indexed docs?
>>>> 5)  How can it disable caching on the master altogether as queries
>>>> won't hit
>>>> the master?
>>>> 6)  For an average doc size of 50k-100k, is that too large for
>>>> solr,
>>>> or even
>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>> reduce
>>>> the size of docs, can we expect to index more documents?
>>>>
>>>> The followings are info related to software/hardware/configuration:
>>>>
>>>> Solr version (solr nightly build on 5/23/2008)
>>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>Solr Implementation Version: nightly
>>>>Lucene Specification Version: 2.3.2
>>>>Lucene Implementation Version: 2.3.2 652650
>>>>Jetty: 6.1.3
>>>>
>>>> Schema.xml (the section that I think are relevant to the master
>>>> server.)
>>>>
>>>>>>> sortMissingLast="true"
>>>> omitNorms="true"/>
>>>>>>> positionIncrementGap="100">
>>>>  
>>>>
>>>>  
>>>>
>>>>
>>>> >>> required="true"
>>>> />
>>>> >>> multiValued="true" omitNorms="true"/>
>>>>>>> stored="false"
>>>> omitNorms="true"/>
>>>>
>>>> id
>>>>
>>>> Solrconfig.xml
>>>>  
>>>>false
>>>>10
>>>>    500
>>>>50
>>>>5000
>>>>2
>>>>1000
>>>>1
>>>>
>>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>>single
>>>>  
>>>>
>>>>  
>>>>false
>>>>    50
>>>>10
>>>>
>>>>500
>>>>5000
>>>>2
>>>>false
>>>>  
>>>>  
>>>>
>>>>
>>>>  50
>>>>  18
>>>>
>>>>
>>>>  solr/bin/snapshooter
>>>>  .
>>>>  true
>>>>
>>>>  
>>>>
>>>>  
>>>>50
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>>>>  class="solr.LRUCache"
>>>>  size="0"
>>>>  initialSize="0"
>>>>  autowarmCount="0"/>
>>>>true
>>>>
>>>>1
>>>>1
>>>>
>>>>
>>>>  
>>>> user_id 0 >>> name="rows">1 
>>>>static newSearcher warming query from
>>>> solrconfig.xml
>>>>  
>>>>
>>>>
>>>>  
>>>> fast_warm 0
>>>> >>> name="rows">10 
>>>>static firstSearcher warming query from
>>>> solrconfig.xml
>>>>  
>>>>
>>>>false
>>>>4
>>>>  
>>>>
>>>> Replication:
>>>>The snappuller is scheduled to run every 15 mins for now.
>>>>
>>>> Hardware:
>>>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>
>>>> OS:
>>>>Fedora 8 (64-bit)
>>>>
>>>> JVM version:
>>>>java version "1.7.0"
>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>
>>>> Java options:
>>>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>> -XX:+UseParallelGC -jar start.jar
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-05-29 Thread Yonik Seeley
e mostly:
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>> OR
>>>>> Ping query caused exception: null (this is probably caused by the OOM
>>>>> problem)
>>>>>
>>>>> There were also a few cases that the java process even went away.
>>>>>
>>>>> Questions:
>>>>> 1)  Is it better to use the multi-valued string field or the
>>>>> text_ws
>>>>> field
>>>>> for this large field?
>>>>> 2)  Is it better to have more outstanding docs per commit or more
>>>>> frequent
>>>>> commit, in term of maximizing server resources?  What is the preferred
>>>>> way
>>>>> to commit documents assuming that solr master receives updates
>>>>> frequently?
>>>>> How many updated docs should there be before issuing a commit?
>>>>> 3)  How to avoid the OOM problem in my case? I'm already doing
>>>>> (-Xms1536M
>>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>>> adding
>>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>>> to
>>>>> consider?
>>>>> 4)  Any recommendation for the master server configuration, in a
>>>>> sense that I
>>>>> can maximize the number of indexed docs?
>>>>> 5)  How can it disable caching on the master altogether as queries
>>>>> won't hit
>>>>> the master?
>>>>> 6)  For an average doc size of 50k-100k, is that too large for
>>>>> solr,
>>>>> or even
>>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>>> reduce
>>>>> the size of docs, can we expect to index more documents?
>>>>>
>>>>> The followings are info related to software/hardware/configuration:
>>>>>
>>>>> Solr version (solr nightly build on 5/23/2008)
>>>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>>Solr Implementation Version: nightly
>>>>>Lucene Specification Version: 2.3.2
>>>>>Lucene Implementation Version: 2.3.2 652650
>>>>>Jetty: 6.1.3
>>>>>
>>>>> Schema.xml (the section that I think are relevant to the master
>>>>> server.)
>>>>>
>>>>>>>>> sortMissingLast="true"
>>>>> omitNorms="true"/>
>>>>>>>>> positionIncrementGap="100">
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>> >>>> required="true"
>>>>> />
>>>>> >>>> multiValued="true" omitNorms="true"/>
>>>>>>>>> stored="false"
>>>>> omitNorms="true"/>
>>>>>
>>>>> id
>>>>>
>>>>> Solrconfig.xml
>>>>>  
>>>>>false
>>>>>10
>>>>>500
>>>>>50
>>>>>5000
>>>>>2
>>>>>1000
>>>>>1
>>>>>
>>>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>>>single
>>>>>  
>>>>>
>>>>>  
>>>>>false
>>>>>50
>>>>>10
>>>>>
>>>>>500
>>>>>5000
>>>>>2
>>>>>false
>>>>>  
>>>>>  
>>>>>
>>>>>
>>>>>  50
>>>>>  18
>>>>>
>>>>>
>>>>>  solr/bin/snapshooter
>>>>>  .
>>>>>  true
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>50
>>>>>>>>>  class="solr.LRUCache"
>>>>>  size="0"
>>>>>  initialSize="0"
>>>>>  autowarmCount="0"/>
>>>>>>>>>  class="solr.LRUCache"
>>>>>  size="0"
>>>>>  initialSize="0"
>>>>>  autowarmCount="0"/>
>>>>>>>>>  class="solr.LRUCache"
>>>>>  size="0"
>>>>>  initialSize="0"
>>>>>  autowarmCount="0"/>
>>>>>true
>>>>>
>>>>>1
>>>>>1
>>>>>
>>>>>
>>>>>  
>>>>> user_id 0 >>>> name="rows">1 
>>>>>static newSearcher warming query from
>>>>> solrconfig.xml
>>>>>  
>>>>>
>>>>>
>>>>>  
>>>>> fast_warm 0
>>>>> >>>> name="rows">10 
>>>>>static firstSearcher warming query from
>>>>> solrconfig.xml
>>>>>  
>>>>>
>>>>>false
>>>>>4
>>>>>  
>>>>>
>>>>> Replication:
>>>>>The snappuller is scheduled to run every 15 mins for now.
>>>>>
>>>>> Hardware:
>>>>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>>
>>>>> OS:
>>>>>Fedora 8 (64-bit)
>>>>>
>>>>> JVM version:
>>>>>java version "1.7.0"
>>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>>
>>>>> Java options:
>>>>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>>> -XX:+UseParallelGC -jar start.jar
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr indexing configuration help

2008-05-29 Thread Gaku Mak
ously add
>>>>>> new
>>>>>> documents at the rate of about 10 documents every 30 seconds.
>>>>>> Autocommit
>>>>>> is
>>>>>> being used (50 docs and 180 seconds constraints), but I have no idea
>>>>>> if
>>>>>> this
>>>>>> is the preferred way.  The goal is to keep adding new documents until
>>>>>> we
>>>>>> can
>>>>>> get at least 200,000 documents (or about 20GB of index) on the master
>>>>>> (or
>>>>>> even more if the server can handle it)
>>>>>>
>>>>>> What I experienced from the indexing stress test is that the master
>>>>>> server
>>>>>> failed to respond after a while, such as non-pingable when there are
>>>>>> about
>>>>>> 30k documents.  When looking at the log, they are mostly:
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>> OR
>>>>>> Ping query caused exception: null (this is probably caused by the OOM
>>>>>> problem)
>>>>>>
>>>>>> There were also a few cases that the java process even went away.
>>>>>>
>>>>>> Questions:
>>>>>> 1)  Is it better to use the multi-valued string field or the
>>>>>> text_ws
>>>>>> field
>>>>>> for this large field?
>>>>>> 2)  Is it better to have more outstanding docs per commit or more
>>>>>> frequent
>>>>>> commit, in term of maximizing server resources?  What is the
>>>>>> preferred
>>>>>> way
>>>>>> to commit documents assuming that solr master receives updates
>>>>>> frequently?
>>>>>> How many updated docs should there be before issuing a commit?
>>>>>> 3)  How to avoid the OOM problem in my case? I'm already doing
>>>>>> (-Xms1536M
>>>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>>>> adding
>>>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>>>> to
>>>>>> consider?
>>>>>> 4)  Any recommendation for the master server configuration, in a
>>>>>> sense that I
>>>>>> can maximize the number of indexed docs?
>>>>>> 5)  How can it disable caching on the master altogether as
>>>>>> queries
>>>>>> won't hit
>>>>>> the master?
>>>>>> 6)  For an average doc size of 50k-100k, is that too large for
>>>>>> solr,
>>>>>> or even
>>>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>>>> reduce
>>>>>> the size of docs, can we expect to index more documents?
>>>>>>
>>>>>> The followings are info related to software/hardware/configuration:
>>>>>>
>>>>>> Solr version (solr nightly build on 5/23/2008)
>>>>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>>>Solr Implementation Version: nightly
>>>>>>Lucene Specification Version: 2.3.2
>>>>>>Lucene Implementation Version: 2.3.2 652650
>>>>>>Jetty: 6.1.3
>>>>>>
>>>>>> Schema.xml (the section that I think are relevant to the master
>>>>>> server.)
>>>>>>
>>>>>>>>>>> sortMissingLast="true"
>>>>>> omitNorms="true"/>
>>>>>>>>>>> positionIncrementGap="100">
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>    
>>>>>>
>>>>>> >>>>> required="true"
>>>>>> />
>>>>>> >>>>> stored="false"
>>>>>> multiValued="true" omitNorms="true"/>
>>>>>>>>>>> stored="false"
>>>>>> omitNorms="true"/>
>>>>>>
>>>>>> id
>>>>>>
>>>>>> Solrconfig.xml
>>>>>>  
>>>>>>false
>>>>>>10
>>>>>>500
>>>>>>50
>>>>>>5000
>>>>>>2
>>>>>>1000
>>>>>>1
>>>>>>
>>>>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>>>>single
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>false
>>>>>>50
>>>>>>10
>>>>>>
>>>>>>500
>>>>>>5000
>>>>>>2
>>>>>>false
>>>>>>  
>>>>>>  
>>>>>>
>>>>>>
>>>>>>  50
>>>>>>  18
>>>>>>
>>>>>>
>>>>>>  solr/bin/snapshooter
>>>>>>  .
>>>>>>  true
>>>>>>
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>50
>>>>>>>>>>>  class="solr.LRUCache"
>>>>>>  size="0"
>>>>>>  initialSize="0"
>>>>>>  autowarmCount="0"/>
>>>>>>>>>>>  class="solr.LRUCache"
>>>>>>  size="0"
>>>>>>  initialSize="0"
>>>>>>  autowarmCount="0"/>
>>>>>>>>>>>  class="solr.LRUCache"
>>>>>>  size="0"
>>>>>>  initialSize="0"
>>>>>>  autowarmCount="0"/>
>>>>>>true
>>>>>>
>>>>>>1
>>>>>>1
>>>>>>
>>>>>>
>>>>>>  
>>>>>> user_id 0
>>>>>> >>>>> name="rows">1 
>>>>>>static newSearcher warming query from
>>>>>> solrconfig.xml
>>>>>>  
>>>>>>
>>>>>>
>>>>>>  
>>>>>> fast_warm 0
>>>>>> >>>>> name="rows">10 
>>>>>>static firstSearcher warming query from
>>>>>> solrconfig.xml
>>>>>>  
>>>>>>
>>>>>>false
>>>>>>4
>>>>>>  
>>>>>>
>>>>>> Replication:
>>>>>>The snappuller is scheduled to run every 15 mins for now.
>>>>>>
>>>>>> Hardware:
>>>>>>AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>>>
>>>>>> OS:
>>>>>>Fedora 8 (64-bit)
>>>>>>
>>>>>> JVM version:
>>>>>>java version "1.7.0"
>>>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>>>
>>>>>> Java options:
>>>>>>java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>>>> -XX:+UseParallelGC -jar start.jar
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550792.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-05-30 Thread Gaku Mak

I started running the test on 2 other machines with similar specs but more
RAM (4G). One of them now has about 60k docs and still running fine. On the
other machine, solr died at about 43k docs. A short while before solr died,
I saw that there were 5 searchers at the same time. Do any of you know why
would solr create 5 searchers, and if that could cause solr to die? Is there
any way to prevent this? Also is there a way to totally disable the searcher
and whether that is a way to optimize the solr master?

I copied the following from the SOLR Statistics page in case it has
interested info:

name:[EMAIL PROTECTED] main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  caching : true
numDocs : 42754
maxDoc : 42754
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
indexVersion : 1211702500453
openedAt : Fri May 30 10:04:15 PDT 2008
registeredAt : Fri May 30 10:05:05 PDT 2008

name:   [EMAIL PROTECTED] main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  caching : true
numDocs : 42754
maxDoc : 42754
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
indexVersion : 1211702500453
openedAt : Fri May 30 10:03:24 PDT 2008
registeredAt : Fri May 30 10:03:41 PDT 2008

name:   [EMAIL PROTECTED] main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  caching : true
numDocs : 42675
maxDoc : 42675
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
indexVersion : 1211702500450
openedAt : Fri May 30 10:00:53 PDT 2008
registeredAt : Fri May 30 10:01:05 PDT 2008

name:   [EMAIL PROTECTED] main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  caching : true
numDocs : 42697
maxDoc : 42697
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
indexVersion : 1211702500451
openedAt : Fri May 30 10:02:20 PDT 2008
registeredAt : Fri May 30 10:02:22 PDT 2008

name:   [EMAIL PROTECTED] main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  caching : true
numDocs : 42724
maxDoc : 42724
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
indexVersion : 1211702500452
openedAt : Fri May 30 10:02:55 PDT 2008
registeredAt : Fri May 30 10:02:57 PDT 2008 

Thank you all so much for your help. I really appreciate it.

-Gaku

Yonik Seeley wrote:
> 
> It's most likely a
> 1) hardware issue: bad memory
>  OR
> 2) incompatible libraries (most likely libc version for the JVM).
> 
> If you have another box around, try that.
> 
> -Yonik
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17566612.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-05-30 Thread Yonik Seeley
Some things to try:
- turn off autowarming on the master
- turn off autocommit, unless you really need it, or change it to be
less agressive:  autocommitting every 50 docs is bad if you are
rapidly adding documents.
- set maxWarmingSearchers to 1 to prevent the buildup of searchers

-Yonik

On Fri, May 30, 2008 at 3:39 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>
> I started running the test on 2 other machines with similar specs but more
> RAM (4G). One of them now has about 60k docs and still running fine. On the
> other machine, solr died at about 43k docs. A short while before solr died,
> I saw that there were 5 searchers at the same time. Do any of you know why
> would solr create 5 searchers, and if that could cause solr to die? Is there
> any way to prevent this? Also is there a way to totally disable the searcher
> and whether that is a way to optimize the solr master?
>
> I copied the following from the SOLR Statistics page in case it has
> interested info:
>
> name:[EMAIL PROTECTED] main
> class:  org.apache.solr.search.SolrIndexSearcher
> version:1.0
> description:index searcher
> stats:  caching : true
> numDocs : 42754
> maxDoc : 42754
> readerImpl : MultiSegmentReader
> readerDir :
> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
> indexVersion : 1211702500453
> openedAt : Fri May 30 10:04:15 PDT 2008
> registeredAt : Fri May 30 10:05:05 PDT 2008
>
> name:   [EMAIL PROTECTED] main
> class:  org.apache.solr.search.SolrIndexSearcher
> version:1.0
> description:index searcher
> stats:  caching : true
> numDocs : 42754
> maxDoc : 42754
> readerImpl : MultiSegmentReader
> readerDir :
> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
> indexVersion : 1211702500453
> openedAt : Fri May 30 10:03:24 PDT 2008
> registeredAt : Fri May 30 10:03:41 PDT 2008
>
> name:   [EMAIL PROTECTED] main
> class:  org.apache.solr.search.SolrIndexSearcher
> version:1.0
> description:index searcher
> stats:  caching : true
> numDocs : 42675
> maxDoc : 42675
> readerImpl : MultiSegmentReader
> readerDir :
> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
> indexVersion : 1211702500450
> openedAt : Fri May 30 10:00:53 PDT 2008
> registeredAt : Fri May 30 10:01:05 PDT 2008
>
> name:   [EMAIL PROTECTED] main
> class:  org.apache.solr.search.SolrIndexSearcher
> version:1.0
> description:index searcher
> stats:  caching : true
> numDocs : 42697
> maxDoc : 42697
> readerImpl : MultiSegmentReader
> readerDir :
> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
> indexVersion : 1211702500451
> openedAt : Fri May 30 10:02:20 PDT 2008
> registeredAt : Fri May 30 10:02:22 PDT 2008
>
> name:   [EMAIL PROTECTED] main
> class:  org.apache.solr.search.SolrIndexSearcher
> version:1.0
> description:index searcher
> stats:  caching : true
> numDocs : 42724
> maxDoc : 42724
> readerImpl : MultiSegmentReader
> readerDir :
> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
> indexVersion : 1211702500452
> openedAt : Fri May 30 10:02:55 PDT 2008
> registeredAt : Fri May 30 10:02:57 PDT 2008
>
> Thank you all so much for your help. I really appreciate it.
>
> -Gaku
>
> Yonik Seeley wrote:
>>
>> It's most likely a
>> 1) hardware issue: bad memory
>>  OR
>> 2) incompatible libraries (most likely libc version for the JVM).
>>
>> If you have another box around, try that.
>>
>> -Yonik
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17566612.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr indexing configuration help

2008-06-01 Thread Gaku Mak
42724
>> readerImpl : MultiSegmentReader
>> readerDir :
>> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
>> indexVersion : 1211702500452
>> openedAt : Fri May 30 10:02:55 PDT 2008
>> registeredAt : Fri May 30 10:02:57 PDT 2008
>>
>> Thank you all so much for your help. I really appreciate it.
>>
>> -Gaku
>>
>> Yonik Seeley wrote:
>>>
>>> It's most likely a
>>> 1) hardware issue: bad memory
>>>  OR
>>> 2) incompatible libraries (most likely libc version for the JVM).
>>>
>>> If you have another box around, try that.
>>>
>>> -Yonik
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17566612.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17583518.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr indexing configuration help

2008-06-01 Thread Yonik Seeley
   1.0
>>> description:index searcher
>>> stats:  caching : true
>>> numDocs : 42697
>>> maxDoc : 42697
>>> readerImpl : MultiSegmentReader
>>> readerDir :
>>> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
>>> indexVersion : 1211702500451
>>> openedAt : Fri May 30 10:02:20 PDT 2008
>>> registeredAt : Fri May 30 10:02:22 PDT 2008
>>>
>>> name:   [EMAIL PROTECTED] main
>>> class:  org.apache.solr.search.SolrIndexSearcher
>>> version:1.0
>>> description:index searcher
>>> stats:  caching : true
>>> numDocs : 42724
>>> maxDoc : 42724
>>> readerImpl : MultiSegmentReader
>>> readerDir :
>>> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/solr/data/index
>>> indexVersion : 1211702500452
>>> openedAt : Fri May 30 10:02:55 PDT 2008
>>> registeredAt : Fri May 30 10:02:57 PDT 2008
>>>
>>> Thank you all so much for your help. I really appreciate it.
>>>
>>> -Gaku
>>>
>>> Yonik Seeley wrote:
>>>>
>>>> It's most likely a
>>>> 1) hardware issue: bad memory
>>>>  OR
>>>> 2) incompatible libraries (most likely libc version for the JVM).
>>>>
>>>> If you have another box around, try that.
>>>>
>>>> -Yonik
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17566612.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17583518.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


RE: Solr indexing configuration help

2008-06-02 Thread Norskog, Lance
Solr 1.2 ignores the 'number of documents' attribute. It honors the
"every 30 minutes" attribute.

Lance 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Sunday, June 01, 2008 6:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr indexing configuration help

On Sun, Jun 1, 2008 at 4:43 AM, Gaku Mak <[EMAIL PROTECTED]> wrote:
> I have tried Yonik's suggestions with the following:
> 1) all autowarming are off
> 2) commented out firstsearch and newsearcher event handlers
> 3) increased autocommit interval to 600 docs and 30 minutes 
> (previously 50 docs and 5 minutes)

Glad it looks like your memory issues are solved, but I really wouldn't
use "docs" at all for an autocommit criteria it will just slow down
your full index builds.

-Yonik

> In addition, I updated the java option with the following:
> -d64 -server -Xms2048M -Xmx3072M -XX:-HeapDumpOnOutOfMemoryError 
> -XX:+UseSerialGC
>
> Results:
> I'm currently at 100,000 documents now with about 9.0GB index on a 
> quad machine with 4GB ram.  The stress test is to add 20 documents 
> every 30 seconds now.
>
> It seems like the serial GC works better than the other two 
> alternatives (-XX:+UseParallelGC or -XX:+UseConcMarkSweepGC) for some 
> reason.  I have not seen any OOM since the changes mentioned above 
> (yet).  If others have better experience with other GC and know how to

> configure it properly, please let me know because using serial GC just
doesn't sound right on a quad machine.
>
> Additional questions:
> Does anyone know how solr/lucene use heap in terms of their 
> generations (young vs tenured) on the indexing environment?  If we 
> have this answer, we would be able to better configure the 
> young/tenured ratio in the heap.  Any help is appreciated!  Thanks!
>
> Now, I'm looking into configuring the slave machines.  Well, that's a 
> separate question.
>
>
>
> Yonik Seeley wrote:
>>
>> Some things to try:
>> - turn off autowarming on the master
>> - turn off autocommit, unless you really need it, or change it to be 
>> less agressive:  autocommitting every 50 docs is bad if you are 
>> rapidly adding documents.
>> - set maxWarmingSearchers to 1 to prevent the buildup of searchers
>>
>> -Yonik
>>
>> On Fri, May 30, 2008 at 3:39 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>>>
>>> I started running the test on 2 other machines with similar specs 
>>> but more RAM (4G). One of them now has about 60k docs and still 
>>> running fine. On the other machine, solr died at about 43k docs. A 
>>> short while before solr died, I saw that there were 5 searchers at 
>>> the same time. Do any of you know why would solr create 5 searchers,

>>> and if that could cause solr to die? Is there any way to prevent 
>>> this? Also is there a way to totally disable the searcher and 
>>> whether that is a way to optimize the solr master?
>>>
>>> I copied the following from the SOLR Statistics page in case it has 
>>> interested info:
>>>
>>> name:[EMAIL PROTECTED] main
>>> class:  org.apache.solr.search.SolrIndexSearcher
>>> version:1.0
>>> description:index searcher
>>> stats:  caching : true
>>> numDocs : 42754
>>> maxDoc : 42754
>>> readerImpl : MultiSegmentReader
>>> readerDir :
>>> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/so
>>> lr/data/index
>>> indexVersion : 1211702500453
>>> openedAt : Fri May 30 10:04:15 PDT 2008 registeredAt : Fri May 30 
>>> 10:05:05 PDT 2008
>>>
>>> name:   [EMAIL PROTECTED] main
>>> class:  org.apache.solr.search.SolrIndexSearcher
>>> version:1.0
>>> description:index searcher
>>> stats:  caching : true
>>> numDocs : 42754
>>> maxDoc : 42754
>>> readerImpl : MultiSegmentReader
>>> readerDir :
>>> org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/so
>>> lr/data/index
>>> indexVersion : 1211702500453
>>> openedAt : Fri May 30 10:03:24 PDT 2008 registeredAt : Fri May 30 
>>> 10:03:41 PDT 2008
>>>
>>> name:   [EMAIL PROTECTED] main
>>> class:  org.apache.solr.search.SolrIndexSearcher
>>> version:1.0
>>> description:index searcher
>>> stats:  caching : true
>>> numDocs : 42675
>>> maxDoc : 42675
>>> readerImpl : MultiSegmentReader
>>> readerDir :
>>> org.apache.lucene.store.FSDirectory@

Re: Solr indexing configuration help

2008-06-02 Thread Mike Klaas


On 2-Jun-08, at 2:09 PM, Norskog, Lance wrote:


Solr 1.2 ignores the 'number of documents' attribute. It honors the
"every 30 minutes" attribute.


Only if you specify both, I think.  There was a bug in the  
implementation.


-Mike



Lance

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Sunday, June 01, 2008 6:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr indexing configuration help

On Sun, Jun 1, 2008 at 4:43 AM, Gaku Mak <[EMAIL PROTECTED]> wrote:

I have tried Yonik's suggestions with the following:
1) all autowarming are off
2) commented out firstsearch and newsearcher event handlers
3) increased autocommit interval to 600 docs and 30 minutes
(previously 50 docs and 5 minutes)


Glad it looks like your memory issues are solved, but I really  
wouldn't
use "docs" at all for an autocommit criteria it will just slow  
down

your full index builds.

-Yonik


In addition, I updated the java option with the following:
-d64 -server -Xms2048M -Xmx3072M -XX:-HeapDumpOnOutOfMemoryError
-XX:+UseSerialGC

Results:
I'm currently at 100,000 documents now with about 9.0GB index on a
quad machine with 4GB ram.  The stress test is to add 20 documents
every 30 seconds now.

It seems like the serial GC works better than the other two
alternatives (-XX:+UseParallelGC or -XX:+UseConcMarkSweepGC) for some
reason.  I have not seen any OOM since the changes mentioned above
(yet).  If others have better experience with other GC and know how  
to


configure it properly, please let me know because using serial GC  
just

doesn't sound right on a quad machine.


Additional questions:
Does anyone know how solr/lucene use heap in terms of their
generations (young vs tenured) on the indexing environment?  If we
have this answer, we would be able to better configure the
young/tenured ratio in the heap.  Any help is appreciated!  Thanks!

Now, I'm looking into configuring the slave machines.  Well, that's a
separate question.



Yonik Seeley wrote:


Some things to try:
- turn off autowarming on the master
- turn off autocommit, unless you really need it, or change it to be
less agressive:  autocommitting every 50 docs is bad if you are
rapidly adding documents.
- set maxWarmingSearchers to 1 to prevent the buildup of searchers

-Yonik

On Fri, May 30, 2008 at 3:39 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:


I started running the test on 2 other machines with similar specs
but more RAM (4G). One of them now has about 60k docs and still
running fine. On the other machine, solr died at about 43k docs. A
short while before solr died, I saw that there were 5 searchers at
the same time. Do any of you know why would solr create 5  
searchers,



and if that could cause solr to die? Is there any way to prevent
this? Also is there a way to totally disable the searcher and
whether that is a way to optimize the solr master?

I copied the following from the SOLR Statistics page in case it has
interested info:

name:[EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  caching : true
numDocs : 42754
maxDoc : 42754
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/ 
so

lr/data/index
indexVersion : 1211702500453
openedAt : Fri May 30 10:04:15 PDT 2008 registeredAt : Fri May 30
10:05:05 PDT 2008

name:   [EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  caching : true
numDocs : 42754
maxDoc : 42754
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/ 
so

lr/data/index
indexVersion : 1211702500453
openedAt : Fri May 30 10:03:24 PDT 2008 registeredAt : Fri May 30
10:03:41 PDT 2008

name:   [EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  caching : true
numDocs : 42675
maxDoc : 42675
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/ 
so

lr/data/index
indexVersion : 1211702500450
openedAt : Fri May 30 10:00:53 PDT 2008 registeredAt : Fri May 30
10:01:05 PDT 2008

name:   [EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  caching : true
numDocs : 42697
maxDoc : 42697
readerImpl : MultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/var/lib/solr/peoplesolr_0002/ 
so

lr/data/index
indexVersion : 1211702500451
openedAt : Fri May 30 10:02:20 PDT 2008 registeredAt : Fri May 30
10:02:22 PDT 2008

name:   [EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  caching : true
numDocs : 42724
maxDoc : 42724
readerImpl : Mul

Does Solr Indexing Websites possible?

2008-09-30 Thread RaghavPrabhu

Hi all,

  I want to enable the search functionality in my website. Can i use solr
for indexing the website? Is there any option in solr.Pls let me know as
soon as possible.

Thanks in advance
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Does-Solr-Indexing-Websites-possible--tp19755329p19755329.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: AW: SOLR Indexing/Querying

2007-05-31 Thread Chris Hostetter

: It looks alot like using Solr's standard "WordDelimiterFilter" (see the
: sample schema.xml) does what you need.

WordDelimiterFilter will only get you so far.  it can split the indexed
text of "3555LHP" into tokens "3555" and "LHP"; and the user entered
"D3555" into the tokens "D" and "3555" -- but because those tokens
orriginated as part of a single chunk of input text, the QueryParser will
turn them into a phrase query, which will not match on the single token
"3555" ... the "D" just isn't there.

I can't think of anyway to achieve what you want "out of the box" i think
you'd need a custom ReuestHandler that uses your own query parser which
uses boolean queries instead of PhraseQueries.


: > Keyword Typed In / We want it to find
: >
: > D3555 / 3555LHP
: > D460160-BN / D460160
: > D460160BN / D460160
: > Dd454557 / D454557
: > 84200ORB / 84200
: > 84200-ORB / 84200
: > T13420-SCH / T13420
: > t14240-ss / t14240




-Hoss



Re: AW: SOLR Indexing/Querying

2007-05-31 Thread Walter Underwood
I solved something similar to this by creating a "stemmer" for part
numbers. Variations like "-BN" on the end can be treated as inflections
in the part number language, similar to plurals in English.

I used a set of regexes to match and transform, in some cases generating
multiple "root" part numbers. With the per-field analyzers in Solr, this
would work much better.

I'll make another search for the presentation that covers this. It was
at our Ultraseek Users Group Meeting in 1999.

wunder

On 5/31/07 11:46 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> 
> : It looks alot like using Solr's standard "WordDelimiterFilter" (see the
> : sample schema.xml) does what you need.
> 
> WordDelimiterFilter will only get you so far.  it can split the indexed
> text of "3555LHP" into tokens "3555" and "LHP"; and the user entered
> "D3555" into the tokens "D" and "3555" -- but because those tokens
> orriginated as part of a single chunk of input text, the QueryParser will
> turn them into a phrase query, which will not match on the single token
> "3555" ... the "D" just isn't there.
> 
> I can't think of anyway to achieve what you want "out of the box" i think
> you'd need a custom ReuestHandler that uses your own query parser which
> uses boolean queries instead of PhraseQueries.
> 
> 
> : > Keyword Typed In / We want it to find
> : >
> : > D3555 / 3555LHP
> : > D460160-BN / D460160
> : > D460160BN / D460160
> : > Dd454557 / D454557
> : > 84200ORB / 84200
> : > 84200-ORB / 84200
> : > T13420-SCH / T13420
> : > t14240-ss / t14240



URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Rohit Gupta
My Solr server takes very long to update index. The table it hits to index is 
huge with 10Million + records , but even in that case I feel this is very long 
time to index. Below is the snapshot of the /dataimport page

busy
A command is still running...

1:53:39.664
16276
24237
16273
0
2011-06-04 11:25:26


How can i determine why this is happening and how can I improve this. During 
all 
our test on the local server before the migration we could index 5 million 
records in 4-5 hrs, but now its taking too long on the live server.

Regards,
Rohit

Re: Solr indexing socket timeout errors

2011-01-09 Thread Gora Mohanty
On Sat, Jan 8, 2011 at 3:44 AM, Burton-West, Tom  wrote:
> Hello all,
>
> We are getting intermittent socket timeout errors (see below).  Out of about 
> 600,000 indexing requests, 30 returned these socket timeout errors.  We 
> haven't been able to correlate these with large merges, which tends to slow 
> down the indexing response rate.
>
> Does anyone know where we might look to determine the cause?
[...]

We also experienced such timeouts when our indexing was
taking some 7-8 hours. It almost certainly had to do with
either the network, or the database server, but we were
also unable to track it down. You could try monitoring the
network for glitches, and monitoring load on the database
server to see if there is a correlation with excessive load.

What database are you using? If it is Microsoft SQL server,
we found that moving to the jtds JDBC driver, rather than
the Microsoft one helped reduce such errors, though it did
not eliminate them.

What we finally ended up doing is sharding the indexing task,
both at the database end, and at the Solr end. Then, we just
detect any indexing errors, and reindex that shard, which is
much faster than a complete reindexing. Along with the above
improvements, the problem became quite minor.

Regards,
Gora


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Mark Miller
Kranti™ K K Parisa wrote:
> Hi All,
>
> I have a problem using SOLR indexing. I am trying to index 96 pages PDF file
> (using PDFBox for extracting the file contents into String). But
> surprisingly SOLR Indexing is not done for the full document. Means I can't
> get all the token how ever the field contains the full text of the PDF as i
> am storing the field along with indexing.
>
> Is there any such limitations with SOLR indexing, please let me know at the
> earliest.
>
> Thanks in advance!
>
> Best Regards,
> Kranti K K Parisa
>
>   
Take a look at maxFieldLength in solrconfig.xml

-- 
- Mark

http://www.lucidimagination.com





Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark,

I really appreciate the quick reply.

here is what I have in the config xml

32
2147483647
  *  1*
1000
1

Does this matter with Tokens?? Because the field I am using is having the
full content of the file ( I checked that using Lukeall jar file), how ever
Tokens are not getting generated completely because of which my search not
working for the full content.

Please suggest.

Best Regards,
Kranti K K Parisa



On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller  wrote:

> Kranti™ K K Parisa wrote:
> > Hi All,
> >
> > I have a problem using SOLR indexing. I am trying to index 96 pages PDF
> file
> > (using PDFBox for extracting the file contents into String). But
> > surprisingly SOLR Indexing is not done for the full document. Means I
> can't
> > get all the token how ever the field contains the full text of the PDF as
> i
> > am storing the field along with indexing.
> >
> > Is there any such limitations with SOLR indexing, please let me know at
> the
> > earliest.
> >
> > Thanks in advance!
> >
> > Best Regards,
> > Kranti K K Parisa
> >
> >
> Take a look at maxFieldLength in solrconfig.xml
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Mark Miller
It limits the number of tokens that will be indexed.

Kranti™ K K Parisa wrote:
> Hi Mark,
>
> I really appreciate the quick reply.
>
> here is what I have in the config xml
>
> 32
> 2147483647
>   *  1*
> 1000
> 1
>
> Does this matter with Tokens?? Because the field I am using is having
> the full content of the file ( I checked that using Lukeall jar file),
> how ever Tokens are not getting generated completely because of which
> my search not working for the full content.
>
> Please suggest.
>
> Best Regards,
> Kranti K K Parisa
>
>
>
> On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller  <mailto:markrmil...@gmail.com>> wrote:
>
> Kranti™ K K Parisa wrote:
> > Hi All,
> >
> > I have a problem using SOLR indexing. I am trying to index 96
>     pages PDF file
> > (using PDFBox for extracting the file contents into String). But
> > surprisingly SOLR Indexing is not done for the full document.
> Means I can't
> > get all the token how ever the field contains the full text of
> the PDF as i
> > am storing the field along with indexing.
> >
> > Is there any such limitations with SOLR indexing, please let me
> know at the
> > earliest.
> >
> > Thanks in advance!
> >
> > Best Regards,
> > Kranti K K Parisa
> >
> >
> Take a look at maxFieldLength in solrconfig.xml
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>


-- 
- Mark

http://www.lucidimagination.com





Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark,

As you see my config file contains the value as 10,000
1

But when I check thru Lukeall jar file I can see the Term count around
3,000.

Please suggest.

Best Regards,
Kranti K K Parisa



2010/1/19 Mark Miller 

> It limits the number of tokens that will be indexed.
>
> Kranti™ K K Parisa wrote:
> > Hi Mark,
> >
> > I really appreciate the quick reply.
> >
> > here is what I have in the config xml
> >
> > 32
> > 2147483647
> >   *  1*
> > 1000
> > 1
> >
> > Does this matter with Tokens?? Because the field I am using is having
> > the full content of the file ( I checked that using Lukeall jar file),
> > how ever Tokens are not getting generated completely because of which
> > my search not working for the full content.
> >
> > Please suggest.
> >
> > Best Regards,
> > Kranti K K Parisa
> >
> >
> >
> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller  > <mailto:markrmil...@gmail.com>> wrote:
> >
> > Kranti™ K K Parisa wrote:
> > > Hi All,
> >     >
> > > I have a problem using SOLR indexing. I am trying to index 96
> > pages PDF file
> > > (using PDFBox for extracting the file contents into String). But
> > > surprisingly SOLR Indexing is not done for the full document.
> > Means I can't
> > > get all the token how ever the field contains the full text of
> > the PDF as i
> > > am storing the field along with indexing.
> > >
> > > Is there any such limitations with SOLR indexing, please let me
> > know at the
> > > earliest.
> > >
> > > Thanks in advance!
> > >
> > > Best Regards,
> > > Kranti K K Parisa
> > >
> > >
> > Take a look at maxFieldLength in solrconfig.xml
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark,

I changed the value to 1,000,000,000 to just test my luck.

But unfortunately I am still not getting the index for all Token.

Please suggest.

Best Regards,
Kranti K K Parisa



2010/1/19 Kranti™ K K Parisa 

> Hi Mark,
>
> As you see my config file contains the value as 10,000
> 1
>
> But when I check thru Lukeall jar file I can see the Term count around
> 3,000.
>
> Please suggest.
>
> Best Regards,
> Kranti K K Parisa
>
>
>
> 2010/1/19 Mark Miller 
>
> It limits the number of tokens that will be indexed.
>>
>> Kranti™ K K Parisa wrote:
>> > Hi Mark,
>> >
>> > I really appreciate the quick reply.
>> >
>> > here is what I have in the config xml
>> >
>> > 32
>> > 2147483647
>> >   *  1*
>> > 1000
>> > 1
>> >
>> > Does this matter with Tokens?? Because the field I am using is having
>> > the full content of the file ( I checked that using Lukeall jar file),
>> > how ever Tokens are not getting generated completely because of which
>> > my search not working for the full content.
>> >
>> > Please suggest.
>> >
>> > Best Regards,
>> > Kranti K K Parisa
>> >
>> >
>> >
>> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller > > <mailto:markrmil...@gmail.com>> wrote:
>> >
>> > Kranti™ K K Parisa wrote:
>> > > Hi All,
>> > >
>> > > I have a problem using SOLR indexing. I am trying to index 96
>> > pages PDF file
>> > > (using PDFBox for extracting the file contents into String). But
>> > > surprisingly SOLR Indexing is not done for the full document.
>> > Means I can't
>> > > get all the token how ever the field contains the full text of
>> > the PDF as i
>> > > am storing the field along with indexing.
>> > >
>> > > Is there any such limitations with SOLR indexing, please let me
>> > know at the
>> > > earliest.
>> > >
>> > > Thanks in advance!
>> > >
>> > > Best Regards,
>> > > Kranti K K Parisa
>> > >
>> > >
>> > Take a look at maxFieldLength in solrconfig.xml
>> >
>> > --
>> > - Mark
>> >
>> > http://www.lucidimagination.com
>> >
>> >
>> >
>> >
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Erick Erickson
Did you reindex the documents you examined? That limit
is applied when you index.

Try searching the user list for maxfieldlength, this topic
has been discussed many times and you should find a
solution.

HTH
Erick

2010/1/19 Kranti™ K K Parisa 

> Can anyone suggest/guide me on this.
>
> Best Regards,
> Kranti K K Parisa
>
>
>
> 2010/1/19 Kranti™ K K Parisa 
>
> > Hi Mark,
> >
> > I changed the value to 1,000,000,000 to just test my luck.
> >
> > But unfortunately I am still not getting the index for all Token.
> >
> > Please suggest.
> >
> > Best Regards,
> > Kranti K K Parisa
> >
> >
> >
> > 2010/1/19 Kranti™ K K Parisa 
> >
> > Hi Mark,
> >>
> >> As you see my config file contains the value as 10,000
> >> 1
> >>
> >> But when I check thru Lukeall jar file I can see the Term count around
> >> 3,000.
> >>
> >> Please suggest.
> >>
> >> Best Regards,
> >> Kranti K K Parisa
> >>
> >>
> >>
> >> 2010/1/19 Mark Miller 
> >>
> >> It limits the number of tokens that will be indexed.
> >>>
> >>> Kranti™ K K Parisa wrote:
> >>> > Hi Mark,
> >>> >
> >>> > I really appreciate the quick reply.
> >>> >
> >>> > here is what I have in the config xml
> >>> >
> >>> > 32
> >>> > 2147483647
> >>> >   *  1*
> >>> > 1000
> >>> > 1
> >>> >
> >>> > Does this matter with Tokens?? Because the field I am using is having
> >>> > the full content of the file ( I checked that using Lukeall jar
> file),
> >>> > how ever Tokens are not getting generated completely because of which
> >>> > my search not working for the full content.
> >>> >
> >>> > Please suggest.
> >>> >
> >>> > Best Regards,
> >>> > Kranti K K Parisa
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller  >>> > <mailto:markrmil...@gmail.com>> wrote:
> >>> >
> >>> > Kranti™ K K Parisa wrote:
> >>> > > Hi All,
> >>> > >
> >>> > > I have a problem using SOLR indexing. I am trying to index 96
> >>> > pages PDF file
> >>> > > (using PDFBox for extracting the file contents into String).
> But
> >>> > > surprisingly SOLR Indexing is not done for the full document.
> >>> > Means I can't
> >>> > > get all the token how ever the field contains the full text of
> >>> > the PDF as i
> >>> > > am storing the field along with indexing.
> >>> > >
> >>> > > Is there any such limitations with SOLR indexing, please let me
> >>> > know at the
> >>> > > earliest.
> >>> > >
> >>> > > Thanks in advance!
> >>> > >
> >>> > > Best Regards,
> >>> > > Kranti K K Parisa
> >>> > >
> >>> > >
> >>> > Take a look at maxFieldLength in solrconfig.xml
> >>> >
> >>> > --
> >>> > - Mark
> >>> >
> >>> > http://www.lucidimagination.com
> >>> >
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>> --
> >>> - Mark
> >>>
> >>> http://www.lucidimagination.com
> >>>
> >>>
> >>>
> >>>
> >>
> >
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Can anyone suggest/guide me on this.

Best Regards,
Kranti K K Parisa



2010/1/19 Kranti™ K K Parisa 

> Hi Mark,
>
> I changed the value to 1,000,000,000 to just test my luck.
>
> But unfortunately I am still not getting the index for all Token.
>
> Please suggest.
>
> Best Regards,
> Kranti K K Parisa
>
>
>
> 2010/1/19 Kranti™ K K Parisa 
>
> Hi Mark,
>>
>> As you see my config file contains the value as 10,000
>> 1
>>
>> But when I check thru Lukeall jar file I can see the Term count around
>> 3,000.
>>
>> Please suggest.
>>
>> Best Regards,
>> Kranti K K Parisa
>>
>>
>>
>> 2010/1/19 Mark Miller 
>>
>> It limits the number of tokens that will be indexed.
>>>
>>> Kranti™ K K Parisa wrote:
>>> > Hi Mark,
>>> >
>>> > I really appreciate the quick reply.
>>> >
>>> > here is what I have in the config xml
>>> >
>>> > 32
>>> > 2147483647
>>> >   *  1*
>>> > 1000
>>> > 1
>>> >
>>> > Does this matter with Tokens?? Because the field I am using is having
>>> > the full content of the file ( I checked that using Lukeall jar file),
>>> > how ever Tokens are not getting generated completely because of which
>>> > my search not working for the full content.
>>> >
>>> > Please suggest.
>>> >
>>> > Best Regards,
>>> > Kranti K K Parisa
>>> >
>>> >
>>> >
>>> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller >> > <mailto:markrmil...@gmail.com>> wrote:
>>> >
>>> > Kranti™ K K Parisa wrote:
>>> >     > Hi All,
>>> > >
>>> > > I have a problem using SOLR indexing. I am trying to index 96
>>> > pages PDF file
>>> > > (using PDFBox for extracting the file contents into String). But
>>> > > surprisingly SOLR Indexing is not done for the full document.
>>> > Means I can't
>>> > > get all the token how ever the field contains the full text of
>>> > the PDF as i
>>> > > am storing the field along with indexing.
>>> > >
>>> > > Is there any such limitations with SOLR indexing, please let me
>>> > know at the
>>> > > earliest.
>>> > >
>>> > > Thanks in advance!
>>> > >
>>> > > Best Regards,
>>> > > Kranti K K Parisa
>>> > >
>>> > >
>>> > Take a look at maxFieldLength in solrconfig.xml
>>> >
>>> > --
>>> > - Mark
>>> >
>>> > http://www.lucidimagination.com
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Erik,

Yes, i deleted the index and re-indexed after increasing the value (i have
restarted tomcat as well)

but still no luck. but i was just wondering the field that i am trying to
index has the complete document text in it as i am storing that. but not
getting the complete terms/tokens into the index to perform the search.

What would be the suggestible Analyzers, filters that I should check with?

Currently I am using the following:


 

   



 





Please suggest
Best Regards,
Kranti K K Parisa



On Tue, Jan 19, 2010 at 9:03 PM, Erick Erickson wrote:

> Did you reindex the documents you examined? That limit
> is applied when you index.
>
> Try searching the user list for maxfieldlength, this topic
> has been discussed many times and you should find a
> solution.
>
> HTH
> Erick
>
> 2010/1/19 Kranti™ K K Parisa 
>
> > Can anyone suggest/guide me on this.
> >
> > Best Regards,
> > Kranti K K Parisa
> >
> >
> >
> > 2010/1/19 Kranti™ K K Parisa 
> >
> > > Hi Mark,
> > >
> > > I changed the value to 1,000,000,000 to just test my luck.
> > >
> > > But unfortunately I am still not getting the index for all Token.
> > >
> > > Please suggest.
> > >
> > > Best Regards,
> > > Kranti K K Parisa
> > >
> > >
> > >
> > > 2010/1/19 Kranti™ K K Parisa 
> > >
> > > Hi Mark,
> > >>
> > >> As you see my config file contains the value as 10,000
> > >> 1
> > >>
> > >> But when I check thru Lukeall jar file I can see the Term count around
> > >> 3,000.
> > >>
> > >> Please suggest.
> > >>
> > >> Best Regards,
> > >> Kranti K K Parisa
> > >>
> > >>
> > >>
> > >> 2010/1/19 Mark Miller 
> > >>
> > >> It limits the number of tokens that will be indexed.
> > >>>
> > >>> Kranti™ K K Parisa wrote:
> > >>> > Hi Mark,
> > >>> >
> > >>> > I really appreciate the quick reply.
> > >>> >
> > >>> > here is what I have in the config xml
> > >>> >
> > >>> > 32
> > >>> > 2147483647
> > >>> >   *  1*
> > >>> > 1000
> > >>> > 1
> > >>> >
> > >>> > Does this matter with Tokens?? Because the field I am using is
> having
> > >>> > the full content of the file ( I checked that using Lukeall jar
> > file),
> > >>> > how ever Tokens are not getting generated completely because of
> which
> > >>> > my search not working for the full content.
> > >>> >
> > >>> > Please suggest.
> > >>> >
> > >>> > Best Regards,
> > >>> > Kranti K K Parisa
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller <
> markrmil...@gmail.com
> > >>> > <mailto:markrmil...@gmail.com>> wrote:
> > >>> >
> > >>> > Kranti™ K K Parisa wrote:
> > >>> > > Hi All,
> > >>> > >
> > >>> > > I have a problem using SOLR indexing. I am trying to index 96
> > >>> > pages PDF file
> > >>> > > (using PDFBox for extracting the file contents into String).
> > But
> > >>> > > surprisingly SOLR Indexing is not done for the full document.
> > >>> > Means I can't
> > >>> > > get all the token how ever the field contains the full text
> of
> > >>> > the PDF as i
> > >>> > > am storing the field along with indexing.
> > >>> > >
> > >>> > > Is there any such limitations with SOLR indexing, please let
> me
> > >>> > know at the
> > >>> > > earliest.
> > >>> > >
> > >>> > > Thanks in advance!
> > >>> > >
> > >>> > > Best Regards,
> > >>> > > Kranti K K Parisa
> > >>> > >
> > >>> > >
> > >>> > Take a look at maxFieldLength in solrconfig.xml
> > >>> >
> > >>> > --
> > >>> > - Mark
> > >>> >
> > >>> > http://www.lucidimagination.com
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> - Mark
> > >>>
> > >>> http://www.lucidimagination.com
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >
> >
>


Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi

I was doing the same mistake mentioned in this URL.
http://search.lucidimagination.com/search/document/30616a061f8c4bf6/solr_ignoring_maxfieldlength

maxFieldLength is there at 2 places. earlier changed at the indexDefaults
now changed at mainIndex section also.

it worked. Thanks Mark & Erick. I appreciate your help.
<http://search.lucidimagination.com/search/document/30616a061f8c4bf6/solr_ignoring_maxfieldlength>
Best Regards,
Kranti K K Parisa



2010/1/19 Kranti™ K K Parisa 

> Hi Erik,
>
> Yes, i deleted the index and re-indexed after increasing the value (i have
> restarted tomcat as well)
>
> but still no luck. but i was just wondering the field that i am trying to
> index has the complete document text in it as i am storing that. but not
> getting the complete terms/tokens into the index to perform the search.
>
> What would be the suggestible Analyzers, filters that I should check with?
>
> Currently I am using the following:
>
> 
>  
> 
>
> 
> 
> 
>  
> 
> 
> 
> 
>
> Please suggest
> Best Regards,
> Kranti K K Parisa
>
>
>
> On Tue, Jan 19, 2010 at 9:03 PM, Erick Erickson 
> wrote:
>
>> Did you reindex the documents you examined? That limit
>> is applied when you index.
>>
>> Try searching the user list for maxfieldlength, this topic
>> has been discussed many times and you should find a
>> solution.
>>
>> HTH
>> Erick
>>
>> 2010/1/19 Kranti™ K K Parisa 
>>
>> > Can anyone suggest/guide me on this.
>> >
>> > Best Regards,
>> > Kranti K K Parisa
>> >
>> >
>> >
>> > 2010/1/19 Kranti™ K K Parisa 
>> >
>> > > Hi Mark,
>> > >
>> > > I changed the value to 1,000,000,000 to just test my luck.
>> > >
>> > > But unfortunately I am still not getting the index for all Token.
>> > >
>> > > Please suggest.
>> > >
>> > > Best Regards,
>> > > Kranti K K Parisa
>> > >
>> > >
>> > >
>> > > 2010/1/19 Kranti™ K K Parisa 
>> > >
>> > > Hi Mark,
>> > >>
>> > >> As you see my config file contains the value as 10,000
>> > >> 1
>> > >>
>> > >> But when I check thru Lukeall jar file I can see the Term count
>> around
>> > >> 3,000.
>> > >>
>> > >> Please suggest.
>> > >>
>> > >> Best Regards,
>> > >> Kranti K K Parisa
>> > >>
>> > >>
>> > >>
>> > >> 2010/1/19 Mark Miller 
>> > >>
>> > >> It limits the number of tokens that will be indexed.
>> > >>>
>> > >>> Kranti™ K K Parisa wrote:
>> > >>> > Hi Mark,
>> > >>> >
>> > >>> > I really appreciate the quick reply.
>> > >>> >
>> > >>> > here is what I have in the config xml
>> > >>> >
>> > >>> > 32
>> > >>> > 2147483647
>> > >>> >   *  1*
>> > >>> > 1000
>> > >>> > 1
>> > >>> >
>> > >>> > Does this matter with Tokens?? Because the field I am using is
>> having
>> > >>> > the full content of the file ( I checked that using Lukeall jar
>> > file),
>> > >>> > how ever Tokens are not getting generated completely because of
>> which
>> > >>> > my search not working for the full content.
>> > >>> >
>> > >>> > Please suggest.
>> > >>> >
>> > >>> > Best Regards,
>> > >>> > Kranti K K Parisa
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > On Tue, Jan 19, 2010 at 8:27 PM, Mark Miller <
>> markrmil...@gmail.com
>> > >>> > <mailto:markrmil...@gmail.com>> wrote:
>> > >>> >
>> > >>> > Kranti™ K K Parisa wrote:
>> > >>> > > Hi All,
>> > >>> > >
>> > >>> > > I have a problem using SOLR indexing. I am trying to index
>> 96
>> > >>> > pages PDF file
>> > >>> > > (using PDFBox for extracting the file contents into String).
>> > But
>> > >>> > > surprisingly SOLR Indexing is not done for the full
>> document.
>> > >>> > Means I can't
>> > >>> > > get all the token how ever the field contains the full text
>> of
>> > >>> > the PDF as i
>> > >>> > > am storing the field along with indexing.
>> > >>> > >
>> > >>> > > Is there any such limitations with SOLR indexing, please let
>> me
>> > >>> > know at the
>> > >>> > > earliest.
>> > >>> > >
>> > >>> > > Thanks in advance!
>> > >>> > >
>> > >>> > > Best Regards,
>> > >>> > > Kranti K K Parisa
>> > >>> > >
>> > >>> > >
>> > >>> > Take a look at maxFieldLength in solrconfig.xml
>> > >>> >
>> > >>> > --
>> > >>> > - Mark
>> > >>> >
>> > >>> > http://www.lucidimagination.com
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>>
>> > >>>
>> > >>> --
>> > >>> - Mark
>> > >>>
>> > >>> http://www.lucidimagination.com
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>>
>
>


SOLR indexing : Multiple content/document types

2010-01-23 Thread Kranti™ K K Parisa
Hi,

I would like to know the best strategy/standards to follow for indexing
multiple document types thru SOLR.

In other words, let us say we have a file upload form thru which user woudl
upload the files of different types (text, html, xml, word docs,excel
sheets, pdf, jpg, gif..etc)
Once we save the files into the hard disk at server side, we need to
initiate the SOLR indexing.

What would be the best strategy to achieve this and what are the libraries
to be used for different content/document types.

So far used pdfbox to read pdf files. Please suggest for all the possible
content/document types

Best Regards,
Kranti K K Parisa


Re: Does Solr Indexing Websites possible?

2008-10-01 Thread Erick Erickson
Have you looked at Nutch? It's built on top of Lucene and might
be a better fit.

But you simply must give more details about what your
requirements to get a meaningful answer. Imagine *you* were
reading your e-mail without knowing anything except
the information contained in the message. How could you
respond?

Best
Erick


On Wed, Oct 1, 2008 at 2:43 AM, RaghavPrabhu <[EMAIL PROTECTED]>wrote:

>
> Hi all,
>
>  I want to enable the search functionality in my website. Can i use solr
> for indexing the website? Is there any option in solr.Pls let me know as
> soon as possible.
>
> Thanks in advance
> Prabhu.K
> --
> View this message in context:
> http://www.nabble.com/Does-Solr-Indexing-Websites-possible--tp19755329p19755329.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Solr indexing size for a particular document.

2011-04-19 Thread rahul
Hi,

Is there a way to find out Solr indexing size for a particular document. I
am using Solrj to index the documents. 

Assume, I am indexing multiple fields like title, description, content, and
few integer fields in schema.xml, then once I index the content, is there a
way to identify the index size for the particular document during indexing
or after indexing..??

Because, most of the common words are excluded from StopWords.txt using
StopFilterFactory. I just want to calculate the actual index size of the
particular document. Is there any way in current Solr ??

thanks,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Did anyone rewrite the solr indexing section?

2011-05-23 Thread LeoYuan88
Hi all,
 As we all have known, solr put all index files in a single directory,
namely ${datadir}/index, 
but the perfomance's getting slower when the size of index dir's getting
bigger and bigger,
so I wanna split the single dir into serveral dirs, e.g. ${datadir}/index1
and ${datadir}/index2, 
maybe I will put 1 users' info into the first one, and put another 1
users' info into 
the second one, when do searching, I will locate the index dir directly as I
need, 
Did anyone do such a thing before? 
Or does this refactoring of solr make sense?
Any advices or suggestions would be highly appreciated.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Did-anyone-rewrite-the-solr-indexing-section-tp2974539p2974539.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Chris Cowan
How long does the query against the DB take (outside of Solr)? If that's slow 
then it's going to take a while to update the index. You might need to figure a 
way to break things up a bit, maybe use a delta import instead of a full import.

Chris

On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:

> My Solr server takes very long to update index. The table it hits to index is 
> huge with 10Million + records , but even in that case I feel this is very 
> long 
> time to index. Below is the snapshot of the /dataimport page
> 
> busy
> A command is still running...
> 
> 1:53:39.664
> 16276
> 24237
> 16273
> 0
> 2011-06-04 11:25:26
> 
> 
> How can i determine why this is happening and how can I improve this. During 
> all 
> our test on the local server before the migration we could index 5 million 
> records in 4-5 hrs, but now its taking too long on the live server.
> 
> Regards,
> Rohit



Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread lee carroll
Rohit - you have double posted maybe - did Otis's answer not help with
your issue or at least need a response to clarify ?

On 4 June 2011 22:53, Chris Cowan  wrote:
> How long does the query against the DB take (outside of Solr)? If that's slow 
> then it's going to take a while to update the index. You might need to figure 
> a way to break things up a bit, maybe use a delta import instead of a full 
> import.
>
> Chris
>
> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>
>> My Solr server takes very long to update index. The table it hits to index is
>> huge with 10Million + records , but even in that case I feel this is very 
>> long
>> time to index. Below is the snapshot of the /dataimport page
>>
>> busy
>> A command is still running...
>> 
>> 1:53:39.664
>> 16276
>> 24237
>> 16273
>> 0
>> 2011-06-04 11:25:26
>> 
>>
>> How can i determine why this is happening and how can I improve this. During 
>> all
>> our test on the local server before the migration we could index 5 million
>> records in 4-5 hrs, but now its taking too long on the live server.
>>
>> Regards,
>> Rohit
>
>


Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Rohit Gupta
No didn't double post, my be it was in my outbox and went out again.

The queries outside solr dont take so long, to return around 50 rows it 
takes 250 seconds, so I am doing a delta import of around 500,000 rows at a 
time. I have tried turning auto commit  on and things are moving a bit faster 
now. Are there any more tweeking i can do?

Also, planning to move to master-salve model, but am failing to understand 
where 
to start exactly. 

Regards,
Rohit




From: lee carroll 
To: solr-user@lucene.apache.org
Sent: Sun, 5 June, 2011 4:59:44 AM
Subject: Re: URGENT HELP: Improving Solr indexing time

Rohit - you have double posted maybe - did Otis's answer not help with
your issue or at least need a response to clarify ?

On 4 June 2011 22:53, Chris Cowan  wrote:
> How long does the query against the DB take (outside of Solr)? If that's slow 
>then it's going to take a while to update the index. You might need to figure 
>a 
>way to break things up a bit, maybe use a delta import instead of a full 
>import.
>
> Chris
>
> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>
>> My Solr server takes very long to update index. The table it hits to index is
>> huge with 10Million + records , but even in that case I feel this is very 
long
>> time to index. Below is the snapshot of the /dataimport page
>>
>> busy
>> A command is still running...
>> 
>> 1:53:39.664
>> 16276
>> 24237
>> 16273
>> 0
>> 2011-06-04 11:25:26
>> 
>>
>> How can i determine why this is happening and how can I improve this. During 
>>all
>> our test on the local server before the migration we could index 5 million
>> records in 4-5 hrs, but now its taking too long on the live server.
>>
>> Regards,
>> Rohit
>
>


Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Fuad Efendi
Hi Rohit,

I am currently working on https://issues.apache.org/jira/browse/SOLR-2233
which fixes multithreading issues

How complex is your dataimport schema? SOLR-2233 (multithreading, better
connection handling) improves performance... Especially if SQL is
extremely complex and uses few long-running CachedSqlEntityProcessors and
etc.

Also, check your SQL and indexes, in most cases you can _significantly_
improve performance by simply adding appropriate (for your specific SQL)
indexes. I noticed that even very experienced DBAs sometimes create index
, and developer executes query "WHERE KEY2=? ORDER BY KEY1" -
check everything...

Thanks,


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca <http://www.tokenizer.ca/>







On 11-06-05 12:09 AM, "Rohit Gupta"  wrote:

>No didn't double post, my be it was in my outbox and went out again.
>
>The queries outside solr dont take so long, to return around 50 rows
>it 
>takes 250 seconds, so I am doing a delta import of around 500,000 rows at
>a 
>time. I have tried turning auto commit  on and things are moving a bit
>faster 
>now. Are there any more tweeking i can do?
>
>Also, planning to move to master-salve model, but am failing to
>understand where 
>to start exactly. 
>
>Regards,
>Rohit
>
>
>
>
>From: lee carroll 
>To: solr-user@lucene.apache.org
>Sent: Sun, 5 June, 2011 4:59:44 AM
>Subject: Re: URGENT HELP: Improving Solr indexing time
>
>Rohit - you have double posted maybe - did Otis's answer not help with
>your issue or at least need a response to clarify ?
>
>On 4 June 2011 22:53, Chris Cowan  wrote:
>> How long does the query against the DB take (outside of Solr)? If
>>that's slow 
>>then it's going to take a while to update the index. You might need to
>>figure a 
>>way to break things up a bit, maybe use a delta import instead of a full
>>import.
>>
>> Chris
>>
>> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>>
>>> My Solr server takes very long to update index. The table it hits to
>>>index is
>>> huge with 10Million + records , but even in that case I feel this is
>>>very 
>long
>>> time to index. Below is the snapshot of the /dataimport page
>>>
>>> busy
>>> A command is still running...
>>> 
>>> 1:53:39.664
>>> 16276
>>> 24237
>>> 16273
>>> 0
>>> 2011-06-04 11:25:26
>>> 
>>>
>>> How can i determine why this is happening and how can I improve this.
>>>During 
>>>all
>>> our test on the local server before the migration we could index 5
>>>million
>>> records in 4-5 hrs, but now its taking too long on the live server.
>>>
>>> Regards,
>>> Rohit
>>
>>




Re: URGENT HELP: Improving Solr indexing time

2011-06-05 Thread Rohit Gupta
Thanks Faud,

Have started working optimizing my Database structure, since the tables are 
huge 
in terms of records, optimization is taking time. 

Will update the results when complete.

Regards,
Rohit




From: Fuad Efendi 
To: "Solr-User@Lucene. Org" 
Sent: Sun, 5 June, 2011 10:05:22 AM
Subject: Re: URGENT HELP: Improving Solr indexing time

Hi Rohit,

I am currently working on https://issues.apache.org/jira/browse/SOLR-2233
which fixes multithreading issues

How complex is your dataimport schema? SOLR-2233 (multithreading, better
connection handling) improves performance... Especially if SQL is
extremely complex and uses few long-running CachedSqlEntityProcessors and
etc.

Also, check your SQL and indexes, in most cases you can _significantly_
improve performance by simply adding appropriate (for your specific SQL)
indexes. I noticed that even very experienced DBAs sometimes create index
, and developer executes query "WHERE KEY2=? ORDER BY KEY1" -
check everything...

Thanks,


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca <http://www.tokenizer.ca/>







On 11-06-05 12:09 AM, "Rohit Gupta"  wrote:

>No didn't double post, my be it was in my outbox and went out again.
>
>The queries outside solr dont take so long, to return around 50 rows
>it 
>takes 250 seconds, so I am doing a delta import of around 500,000 rows at
>a 
>time. I have tried turning auto commit  on and things are moving a bit
>faster 
>now. Are there any more tweeking i can do?
>
>Also, planning to move to master-salve model, but am failing to
>understand where 
>to start exactly. 
>
>Regards,
>Rohit
>
>
>
>
>From: lee carroll 
>To: solr-user@lucene.apache.org
>Sent: Sun, 5 June, 2011 4:59:44 AM
>Subject: Re: URGENT HELP: Improving Solr indexing time
>
>Rohit - you have double posted maybe - did Otis's answer not help with
>your issue or at least need a response to clarify ?
>
>On 4 June 2011 22:53, Chris Cowan  wrote:
>> How long does the query against the DB take (outside of Solr)? If
>>that's slow 
>>then it's going to take a while to update the index. You might need to
>>figure a 
>>way to break things up a bit, maybe use a delta import instead of a full
>>import.
>>
>> Chris
>>
>> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>>
>>> My Solr server takes very long to update index. The table it hits to
>>>index is
>>> huge with 10Million + records , but even in that case I feel this is
>>>very 
>long
>>> time to index. Below is the snapshot of the /dataimport page
>>>
>>> busy
>>> A command is still running...
>>> 
>>> 1:53:39.664
>>> 16276
>>> 24237
>>> 16273
>>> 0
>>> 2011-06-04 11:25:26
>>> 
>>>
>>> How can i determine why this is happening and how can I improve this.
>>>During 
>>>all
>>> our test on the local server before the migration we could index 5
>>>million
>>> records in 4-5 hrs, but now its taking too long on the live server.
>>>
>>> Regards,
>>> Rohit
>>
>>

  1   2   >