RE: embeded solrj doesn't refresh index

2011-07-29 Thread Jianbin Dai
Thanks Marc.  
Guess I was not clear about my previous statement. So let me rephrase.

I use DIH to import data into solr and do indexing. Everything works fine.

I have another embedded solr server setting to the same index files. I use
embedded solrj to search the index file.

So the first solr is for indexing purpose, it can be turned off once the
indexing is done.

However the changes in the index files cannot show up from embedded solrj,
that is, once the new index is built, from embedded solrj, I still get the
old results. Only after I restart the embedded solr server, the new changes
are reflected from solrj.  The embedded solrj works like there was a caching
that it always goes to first.

Thanks.

JB


-Original Message-
From: Marc Sturlese [mailto:marc.sturl...@gmail.com] 
Sent: Friday, July 22, 2011 1:57 AM
To: solr-user@lucene.apache.org
Subject: RE: embeded solrj doesn't refresh index

Are u indexing with full import? In case yes and the resultant index has
similar num of docs (that the one you had before) try setting reopenReaders
to false in solrconfig.xml
* You have to send the comit, of course.

--
View this message in context:
http://lucene.472066.n3.nabble.com/embeded-solrj-doesn-t-refresh-index-tp318
4321p3190892.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.


-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai j...@huawei.com

 Hi,



 I am using embedded solrj. After I add new doc to the index, I can see the
 changes through solr web, but not from embedded solrj. But after I restart
 the embedded solrj, I do see the changes. It works as if there was a
cache.
 Anyone knows the problem? Thanks.



 Jianbin





RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.

-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai j...@huawei.com

 Hi,



 I am using embedded solrj. After I add new doc to the index, I can see the
 changes through solr web, but not from embedded solrj. But after I restart
 the embedded solrj, I do see the changes. It works as if there was a
cache.
 Anyone knows the problem? Thanks.



 Jianbin





embeded solrj doesn't refresh index

2011-07-19 Thread Jianbin Dai
Hi,

 

I am using embedded solrj. After I add new doc to the index, I can see the
changes through solr web, but not from embedded solrj. But after I restart
the embedded solrj, I do see the changes. It works as if there was a cache.
Anyone knows the problem? Thanks.

 

Jianbin



Solr for noSQL

2011-01-27 Thread Jianbin Dai
Hi,

 

Do we have data import handler to fast read in data from noSQL database,
specifically, MongoDB I am thinking to use? 

Or a more general question, how does Solr work with noSQL database?

Thanks.

 

Jianbin

 



RE: weighted search and index

2010-03-04 Thread Jianbin Dai
Thanks! Will try it.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, March 04, 2010 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

OK, lights are finally dawning. I think what you want is payloads,
see:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payload
s/
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloa
ds/for
your index-time term boosting. Query time boosting is as you
indicated

HTH
Erick

On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai j...@huawei.com wrote:

 Hi Erick,

 Each doc contains some keywords that are indexed. However each keyword is
 associated with a weight to represent its importance. In my example,
 D1: fruit 0.8, apple 0.4, banana 0.2

 The keyword fruit is the most important keyword, which means I really
 really
 want it to be matched in a search result, but banana is less important (It
 would be good to be matched though).

 Hope that explains.

 Thanks.

 JB



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 Then I'm totally lost as to what you're trying to accomplish. Perhaps
 a higher-level statement of the problem would help.

 Because no matter how often I look at your point 2, I don't see
 what relevance the numbers have if you're not using them to
 boost at index time. Why are they even there?

 Erick

 On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

  Thank you very much Erick!
 
  1. I used boost in search, but I don't know exactly what's the best way
 to
  boost, for such as Sports 0.8, golf 0.5 in my example, would it be
  sports^0.8 AND golf^0.5 ?
 
 
  2. I cannot use boost in indexing. Because the weight of the value
 changes,
  not the field, look at this example again,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  There is no good way to boost it during indexing.
 
  Thanks.
 
  JB
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Wednesday, March 03, 2010 5:45 PM
  To: solr-user@lucene.apache.org
  Subject: Re: weighted search and index
 
  You have to provide some more details to get meaningful help.
 
  You say I was trying to use boosting. How? At index time?
  Search time? Both? Can you provide some code snippets?
  What does your schema look like for the relevant field(s)?
 
  You say but seems not working right. What does that mean? No hits?
  Hits not ordered as you expect? Have you tried putting debugQuery=on
 on
  your URL and examined the return values?
 
  Have you looked at your index with the admin page and/or Luke to see if
  the data in the index is as you expect?
 
  As far as I know, boosts are multiplicative. So boosting by a value less
  than
  1 will actually decrease the ranking. But see the Lucene scoring, See:
 
 

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity
 .
 
 html
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.%0Ahtml
 
  And remember, that boosting will *tend* to move a hit up or down in the
  ranking, not position it absolutely.
 
  HTH
  Erick
 
  On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:
 
   Hi,
  
   I am trying to use solr for a content match application.
  
   A content is described by a set of keywords with weights associated,
 eg.,
  
   C1: fruit 0.8, apple 0.4, banana 0.2
   C2: music 0.9, pop song 0.6, Britney Spears 0.4
  
   Those contents would be indexed in solr.
   In the search, I also have a set of keywords with weights:
  
   Query: Sports 0.8, golf 0.5
  
   I am trying to find the closest matching contents for this query.
  
   My question is how to index the contents with weighted scores, and how
 to
   write search query. I was trying to use boosting, but seems not
working
   right.
  
   Thanks.
  
   Jianbin
  
  
  
 
 





weighted search and index

2010-03-03 Thread Jianbin Dai
Hi,

I am trying to use solr for a content match application. 

A content is described by a set of keywords with weights associated, eg.,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

Those contents would be indexed in solr.
In the search, I also have a set of keywords with weights:

Query: Sports 0.8, golf 0.5

I am trying to find the closest matching contents for this query.

My question is how to index the contents with weighted scores, and how to
write search query. I was trying to use boosting, but seems not working
right.

Thanks.

Jianbin




RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Thank you very much Erick!

1. I used boost in search, but I don't know exactly what's the best way to
boost, for such as Sports 0.8, golf 0.5 in my example, would it be
sports^0.8 AND golf^0.5 ?


2. I cannot use boost in indexing. Because the weight of the value changes,
not the field, look at this example again,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

There is no good way to boost it during indexing.

Thanks.

JB


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

You have to provide some more details to get meaningful help.

You say I was trying to use boosting. How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say but seems not working right. What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting debugQuery=on on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

 Hi,

 I am trying to use solr for a content match application.

 A content is described by a set of keywords with weights associated, eg.,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 Those contents would be indexed in solr.
 In the search, I also have a set of keywords with weights:

 Query: Sports 0.8, golf 0.5

 I am trying to find the closest matching contents for this query.

 My question is how to index the contents with weighted scores, and how to
 write search query. I was trying to use boosting, but seems not working
 right.

 Thanks.

 Jianbin






RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Hi Erick,

Each doc contains some keywords that are indexed. However each keyword is
associated with a weight to represent its importance. In my example, 
D1: fruit 0.8, apple 0.4, banana 0.2

The keyword fruit is the most important keyword, which means I really really
want it to be matched in a search result, but banana is less important (It
would be good to be matched though).

Hope that explains.

Thanks.

JB



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 6:23 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point 2, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value
changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:


http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.

htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
rity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated,
eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how
to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 





Use DIH with large xml file

2009-06-20 Thread Jianbin Dai

Hi,

I have about 50GB of data to be indexed each day using DIH. Some of the files 
are as large as 6GB. I set the JVM Xmx to be 3GB, but the DIH crashes on those 
big files. Is there any way to handle it?

Thanks.

JB


  



Re: Use DIH with large xml file

2009-06-20 Thread Jianbin Dai

Can DIH read item by item instead of the whole file before indexing? my biggest 
file size is 6GB, larger than the JVM max ram value.


--- On Sat, 6/20/09, Erik Hatcher e...@ehatchersolutions.com wrote:

 From: Erik Hatcher e...@ehatchersolutions.com
 Subject: Re: Use DIH with large xml file
 To: solr-user@lucene.apache.org
 Date: Saturday, June 20, 2009, 6:52 PM
 How are you configuring DIH to read
 those files?  It is likely that you'll need at least as
 much RAM to the JVM as the largest file you're processing,
 though that depends entirely on how the file is being
 processed.
 
     Erik
 
 On Jun 20, 2009, at 9:23 PM, Jianbin Dai wrote:
 
  
  Hi,
  
  I have about 50GB of data to be indexed each day using
 DIH. Some of the files are as large as 6GB. I set the JVM
 Xmx to be 3GB, but the DIH crashes on those big files. Is
 there any way to handle it?
  
  Thanks.
  
  JB
  
  
  
 
 






Re: Index Comma Separated numbers

2009-06-05 Thread Jianbin Dai

Hi,

Yes, I put it in data-config.xml, like following

entity name=x
   dataSource=xmlreader
   processor=XPathEntityProcessor
   url=${f.fileAbsolutePath}
   forEach=/abc/def/gh
   transformer=NumberFormatTransformer
   
field colum=name .

But it's not working on comma separated numbers.
Did I miss something?

Thanks.





--- On Thu, 6/4/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: Index Comma Separated numbers
 To: solr-user@lucene.apache.org
 Date: Thursday, June 4, 2009, 9:24 PM
 did you try the
 NumberFormatTransformer ?
 
 On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi, One of the fields to be indexed is price which is
 comma separated, e.g., 12,034.00.  How can I indexed it as
 a number?
  I am using DIH to pull the data. Thanks.
 
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 






Re: Index Comma Separated numbers

2009-06-05 Thread Jianbin Dai

I forgot to put formatStyle=number on the field. 
It works now. Thanks!!


--- On Fri, 6/5/09, Jianbin Dai djian...@yahoo.com wrote:

 From: Jianbin Dai djian...@yahoo.com
 Subject: Re: Index Comma Separated numbers
 To: solr-user@lucene.apache.org, noble.p...@gmail.com
 Date: Friday, June 5, 2009, 12:37 PM
 
 Hi,
 
 Yes, I put it in data-config.xml, like following
 
                
     entity name=x
                
        
    dataSource=xmlreader
                
        
    processor=XPathEntityProcessor
                
        
    url=${f.fileAbsolutePath}
                
        
    forEach=/abc/def/gh
                
        
    transformer=NumberFormatTransformer
                
            
                
     field colum=name .
 
 But it's not working on comma separated numbers.
 Did I miss something?
 
 Thanks.
 
 
 
 
 
 --- On Thu, 6/4/09, Noble Paul നോബിള്‍ 
 नोब्ळ् noble.p...@corp.aol.com
 wrote:
 
  From: Noble Paul നോബിള്‍ 
 नोब्ळ् noble.p...@corp.aol.com
  Subject: Re: Index Comma Separated numbers
  To: solr-user@lucene.apache.org
  Date: Thursday, June 4, 2009, 9:24 PM
  did you try the
  NumberFormatTransformer ?
  
  On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com
  wrote:
  
   Hi, One of the fields to be indexed is price
 which is
  comma separated, e.g., 12,034.00.  How can I indexed
 it as
  a number?
   I am using DIH to pull the data. Thanks.
  
  
  
  
  
  
  
  
  -- 
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
  
 
 
 
 
 






Index Comma Separated numbers

2009-06-04 Thread Jianbin Dai

Hi, One of the fields to be indexed is price which is comma separated, e.g., 
12,034.00.  How can I indexed it as a number? 
I am using DIH to pull the data. Thanks.


  



Re: how to do exact serch with solrj

2009-06-04 Thread Jianbin Dai

I still have a problem with exact matching.

query.setQuery(title:\hello the world\);

This will return all docs with title containing hello the world, i.e.,
hello the world, Jack will also be matched. What I want is exactly hello the 
world. Setting this field to string instead of text doesn't work well either, 
because I want something like Hello, The World to be matched as well.
Any idea? Thanks.


 --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com
 wrote:
 
  From: Avlesh Singh avl...@gmail.com
  Subject: Re: how to do exact serch with solrj
  To: solr-user@lucene.apache.org
  Date: Saturday, May 30, 2009, 11:45 PM
  You need exact match for all the
  three tokens?
  If yes, try query.setQuery(title:\hello the
 world\);
  
  Cheers
  Avlesh
  
  On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai djian...@yahoo.com
  wrote:
  
  
   I tried, but seems it's not working right.
  
   --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com
  wrote:
  
From: Avlesh Singh avl...@gmail.com
Subject: Re: how to do exact serch with
 solrj
To: solr-user@lucene.apache.org
Date: Saturday, May 30, 2009, 10:56 PM
query.setQuery(title:hello the
world) is what you need.
   
Cheers
Avlesh
   
On Sun, May 31, 2009 at 6:23 AM, Jianbin
 Dai
  djian...@yahoo.com
wrote:
   

 Hi,

 I want to search hello the world in
 the
  title
field using solrj. I set
 the query filter
 query.addFilterQuery(title);
 query.setQuery(hello the world);

 but it returns not exact match results
 as
  well.

 I know one way to do it is to set
 title
  field to
string instead of text.
 But is there any way i can do it? If I
 do
  the search
through web interface
 Solr Admin by title:hello the world,
 it
  returns
exact matches.

 Thanks.

 JB





   
  
  
  
  
  
  
 
 
       
 





Re: how to do exact serch with solrj

2009-05-31 Thread Jianbin Dai

I tried, but seems it's not working right.

--- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com wrote:

 From: Avlesh Singh avl...@gmail.com
 Subject: Re: how to do exact serch with solrj
 To: solr-user@lucene.apache.org
 Date: Saturday, May 30, 2009, 10:56 PM
 query.setQuery(title:hello the
 world) is what you need.
 
 Cheers
 Avlesh
 
 On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
 
  Hi,
 
  I want to search hello the world in the title
 field using solrj. I set
  the query filter
  query.addFilterQuery(title);
  query.setQuery(hello the world);
 
  but it returns not exact match results as well.
 
  I know one way to do it is to set title field to
 string instead of text.
  But is there any way i can do it? If I do the search
 through web interface
  Solr Admin by title:hello the world, it returns
 exact matches.
 
  Thanks.
 
  JB
 
 
 
 
 
 


  



Re: how to do exact serch with solrj

2009-05-31 Thread Jianbin Dai

That's correct! Thanks Avlesh.

--- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com wrote:

 From: Avlesh Singh avl...@gmail.com
 Subject: Re: how to do exact serch with solrj
 To: solr-user@lucene.apache.org
 Date: Saturday, May 30, 2009, 11:45 PM
 You need exact match for all the
 three tokens?
 If yes, try query.setQuery(title:\hello the world\);
 
 Cheers
 Avlesh
 
 On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai djian...@yahoo.com
 wrote:
 
 
  I tried, but seems it's not working right.
 
  --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com
 wrote:
 
   From: Avlesh Singh avl...@gmail.com
   Subject: Re: how to do exact serch with solrj
   To: solr-user@lucene.apache.org
   Date: Saturday, May 30, 2009, 10:56 PM
   query.setQuery(title:hello the
   world) is what you need.
  
   Cheers
   Avlesh
  
   On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai
 djian...@yahoo.com
   wrote:
  
   
Hi,
   
I want to search hello the world in the
 title
   field using solrj. I set
the query filter
query.addFilterQuery(title);
query.setQuery(hello the world);
   
but it returns not exact match results as
 well.
   
I know one way to do it is to set title
 field to
   string instead of text.
But is there any way i can do it? If I do
 the search
   through web interface
Solr Admin by title:hello the world, it
 returns
   exact matches.
   
Thanks.
   
JB
   
   
   
   
   
  
 
 
 
 
 
 


  


how to do exact serch with solrj

2009-05-30 Thread Jianbin Dai

Hi,

I want to search hello the world in the title field using solrj. I set the 
query filter
query.addFilterQuery(title);
query.setQuery(hello the world);

but it returns not exact match results as well. 

I know one way to do it is to set title field to string instead of text. But 
is there any way i can do it? If I do the search through web interface Solr 
Admin by title:hello the world, it returns exact matches.

Thanks.

JB


  



Re: Is it memory leaking in solr?

2009-05-26 Thread Jianbin Dai


Hi Otis,

The slowness was due to the JVM memory limit set by tomcat.. I have solved this 
problem. Initially I thought there might be memory leaking because I noticed 
the following behavior:
In the peak of indexing, almost all 4GB of memory was used. Once indexing is 
done, the memory usage was about 3GB. If I delete all indexing, and shutdown 
solr, I still noticed that about 2 GB memory used, much more than the initial 
memory usage about 250M.
I am not sure if I guess it right. Thanks.


--- On Tue, 5/26/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Subject: Re: Is it memory leaking in solr?
 To: solr-user@lucene.apache.org
 Date: Tuesday, May 26, 2009, 10:03 AM
 
 Jianbin,
 
 If you connect to that Java process with jconsole, do you
 see a lot of garbage collection activity?
 
 What makes you think there is a memory leak?  The
 slowness?
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Jianbin Dai djian...@yahoo.com
  To: solr-user@lucene.apache.org
  Sent: Monday, May 25, 2009 1:05:43 PM
  Subject: Re: Is it memory leaking in solr?
  
  
  Again, indexing becomes extremely slow after indexed
 8m documents (about 25G of 
  original file size). Here is the memory usage info of
 my computer. Does this 
  have anything to do with tomcat setting? Thanks.
  
  
  top - 08:09:53 up  7:22,  1 user,  load
 average: 1.03, 1.01, 1.00
  Tasks:  78 total,   2
 running,  76 sleeping,   0
 stopped,   0 zombie
  Cpu(s): 49.9%us,  0.2%sy,  0.0%ni,
 49.8%id,  0.2%wa,  0.0%hi,  0.0%si, 
 0.0%st
  Mem:   4044776k total,  3960740k
 used,    84036k free,    42196k buffers
  Swap:  2031608k total,   
    84k used,  2031524k free, 
 2729892k cached
  
    PID USER      PR 
 NI  VIRT  RES  SHR S %CPU %MEM   
 TIME+  COMMAND           
 
                
    
  3322 root      21   0
 1357m 1.0g  11m S  100 27.0 397:51.74 java  
  
  
  
  --- On Mon, 5/25/09, Jianbin Dai wrote:
  
   From: Jianbin Dai 
   Subject: Is it memory leaking in solr?
   To: solr-user@lucene.apache.org,
 noble.p...@gmail.com
   Date: Monday, May 25, 2009, 1:27 AM
   
   I am using DIH to do indexing. After I indexed
 about 8M
   documents (took about 1hr40m), it used up almost
 all memory
   (4GB), and the indexing becomes extremely slow.
 If I delete
   all indexing and shutdown tomcat, it still shows
 over 3gb
   memory was used. Is it memory leaking? if it is,
 then the
   leaking is in solr indexing or DIH? 
 Thanks.
   
   
         
   
   
 
 






Is it memory leaking in solr?

2009-05-25 Thread Jianbin Dai

I am using DIH to do indexing. After I indexed about 8M documents (took about 
1hr40m), it used up almost all memory (4GB), and the indexing becomes extremely 
slow. If I delete all indexing and shutdown tomcat, it still shows over 3gb 
memory was used. Is it memory leaking? if it is, then the leaking is in solr 
indexing or DIH?  Thanks.


  



Re: Is it memory leaking in solr?

2009-05-25 Thread Jianbin Dai

Again, indexing becomes extremely slow after indexed 8m documents (about 25G of 
original file size). Here is the memory usage info of my computer. Does this 
have anything to do with tomcat setting? Thanks.


top - 08:09:53 up  7:22,  1 user,  load average: 1.03, 1.01, 1.00
Tasks:  78 total,   2 running,  76 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.9%us,  0.2%sy,  0.0%ni, 49.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4044776k total,  3960740k used,84036k free,42196k buffers
Swap:  2031608k total,   84k used,  2031524k free,  2729892k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
 3322 root  21   0 1357m 1.0g  11m S  100 27.0 397:51.74 java  



--- On Mon, 5/25/09, Jianbin Dai djian...@yahoo.com wrote:

 From: Jianbin Dai djian...@yahoo.com
 Subject: Is it memory leaking in solr?
 To: solr-user@lucene.apache.org, noble.p...@gmail.com
 Date: Monday, May 25, 2009, 1:27 AM
 
 I am using DIH to do indexing. After I indexed about 8M
 documents (took about 1hr40m), it used up almost all memory
 (4GB), and the indexing becomes extremely slow. If I delete
 all indexing and shutdown tomcat, it still shows over 3gb
 memory was used. Is it memory leaking? if it is, then the
 leaking is in solr indexing or DIH?  Thanks.
 
 
       
 
 






Re: How to index large set data

2009-05-24 Thread Jianbin Dai

Hi Paul,

Hope you have a great weekend so far.
I still have a couple of questions you might help me out:

1. In your earlier email, you said if possible , you can setup multiple DIH 
say /dataimport1, /dataimport2 etc and split your files and can achieve 
parallelism
I am not sure if I understand it right. I put two requesHandler in 
solrconfig.xml, like this

requestHandler name=/dataimport 
class=org.apache.solr.handler..dataimport.DataImportHandler
lst name=defaults
  str name=config./data-config.xml/str
/lst
/requestHandler

requestHandler name=/dataimport2 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=config./data-config2.xml/str
/lst
/requestHandler


and create data-config.xml and data-config2.xml.
then I run the command
http://host:8080/solr/dataimport?command=full-import

But only one data set (the first one) was indexed. Did I get something wrong?


2. I noticed that after solr indexed about 8M documents (around two hours), it 
gets very very slow. I use top command in linux, and noticed that RES is 1g 
of memory. I did several experiments, every time RES reaches 1g, the indexing 
process becomes extremely slow. Is this memory limit set by JVM? And how can I 
set the JVM memory when I use DIH through web command full-import?

Thanks!


JB




--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to index large set data
 To: Jianbin Dai djian...@yahoo.com
 Date: Friday, May 22, 2009, 10:04 PM
 On Sat, May 23, 2009 at 10:27 AM,
 Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi Pual, but in your previous post, you said there is
 already an issue for writing to Solr in multiple threads
  SOLR-1089. Do you think use solrj alone would be better
 than DIH?
 
 nope
 you will have to do indexing in multiple threads
 
 if possible , you can setup multiple DIH say /dataimport1,
 /dataimport2 etc and split your files and can achieve
 parallelism
 
 
  Thanks and have a good weekend!
 
  --- On Fri, 5/22/09, Noble Paul നോബിള്‍
  नोब्ळ् noble.p...@corp.aol.com
 wrote:
 
  no need to use embedded Solrserver..
  you can use SolrJ with streaming
  in multiple threads
 
  On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai
 djian...@yahoo.com
  wrote:
  
   If I do the xml parsing by myself and use
 embedded
  client to do the push, would it be more efficient
 than DIH?
  
  
   --- On Fri, 5/22/09, Grant Ingersoll gsing...@apache.org
  wrote:
  
   From: Grant Ingersoll gsing...@apache.org
   Subject: Re: How to index large set data
   To: solr-user@lucene.apache.org
   Date: Friday, May 22, 2009, 5:38 AM
   Can you parallelize this?  I
   don't know that the DIH can handle it,
   but having multiple threads sending docs
 to Solr
  is the
   best
   performance wise, so maybe you need to
 look at
  alternatives
   to pulling
   with DIH and instead use a client to push
 into
  Solr.
  
  
   On May 22, 2009, at 3:42 AM, Jianbin Dai
 wrote:
  
   
about 2.8 m total docs were created.
 only the
  first
   run finishes. In
my 2nd try, it hangs there forever
 at the end
  of
   indexing, (I guess
right before commit), with cpu usage
 of 100%.
  Total 5G
   (2050) index
files are created. Now I have two
 problems:
1. why it hangs there and failed?
2. how can i speed up the indexing?
   
   
Here is my solrconfig.xml
   
   
  
 
 useCompoundFilefalse/useCompoundFile
   
  
 
 ramBufferSizeMB3000/ramBufferSizeMB
   
  
 mergeFactor1000/mergeFactor
   
  
 
 maxMergeDocs2147483647/maxMergeDocs
   
  
 
 maxFieldLength1/maxFieldLength
   
  
 
 unlockOnStartupfalse/unlockOnStartup
   
   
   
   
--- On Thu, 5/21/09, Noble Paul
   നോബിള്‍  नो
ब्ळ् noble.p...@corp.aol.com
   wrote:
   
From: Noble Paul
 നോബിള്‍
   नोब्ळ्
noble.p...@corp.aol.com
Subject: Re: How to index large
 set data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009,
 10:39 PM
what is the total no:of docs
 created
?  I guess it may not be
 memory
bound. indexing is mostly amn IO
 bound
  operation.
   You may
be able to
get a better perf if a SSD is
 used (solid
  state
   disk)
   
On Fri, May 22, 2009 at 10:46
 AM, Jianbin
  Dai
   djian...@yahoo.com
wrote:
   
Hi Paul,
   
Thank you so much for
 answering my
  questions.
   It
really helped.
After some adjustment,
 basically
  setting
   mergeFactor
to 1000 from the default value
 of 10, I
  can
   finished the
whole job in 2.5 hours. I
 checked that
  during
   running time,
only around 18% of memory is
 being used,
  and VIRT
   is always
1418m. I am thinking it may be
 restricted
  by JVM
   memory
setting. But I run the data
 import
  command through
   web,
i.e.,
   
   
  
 
 http://host:port/solr/dataimport?command=full-import,
how can I set the memory
 allocation for
  JVM?
Thanks again

Re: How to index large set data

2009-05-22 Thread Jianbin Dai

about 2.8 m total docs were created. only the first run finishes. In my 2nd 
try, it hangs there forever at the end of indexing, (I guess right before 
commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I 
have two problems:
1. why it hangs there and failed?
2. how can i speed up the indexing?


Here is my solrconfig.xml

useCompoundFilefalse/useCompoundFile
ramBufferSizeMB3000/ramBufferSizeMB
mergeFactor1000/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
unlockOnStartupfalse/unlockOnStartup




--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Thursday, May 21, 2009, 10:39 PM
 what is the total no:of docs created
 ?  I guess it may not be memory
 bound. indexing is mostly amn IO bound operation. You may
 be able to
 get a better perf if a SSD is used (solid state disk)
 
 On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi Paul,
 
  Thank you so much for answering my questions. It
 really helped.
  After some adjustment, basically setting mergeFactor
 to 1000 from the default value of 10, I can finished the
 whole job in 2.5 hours. I checked that during running time,
 only around 18% of memory is being used, and VIRT is always
 1418m. I am thinking it may be restricted by JVM memory
 setting. But I run the data import command through web,
 i.e.,
 
 http://host:port/solr/dataimport?command=full-import,
 how can I set the memory allocation for JVM?
  Thanks again!
 
  JB
 
  --- On Thu, 5/21/09, Noble Paul നോബിള്‍
  नोब्ळ् noble.p...@corp.aol.com
 wrote:
 
  From: Noble Paul നോബിള്‍
  नोब्ळ् noble.p...@corp.aol.com
  Subject: Re: How to index large set data
  To: solr-user@lucene.apache.org
  Date: Thursday, May 21, 2009, 9:57 PM
  check the status page of DIH and see
  if it is working properly. and
  if, yes what is the rate of indexing
 
  On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
 djian...@yahoo.com
  wrote:
  
   Hi,
  
   I have about 45GB xml files to be indexed. I
 am using
  DataImportHandler. I started the full import 4
 hours ago,
  and it's still running
   My computer has 4GB memory. Any suggestion on
 the
  solutions?
   Thanks!
  
   JB
  
  
  
  
  
 
 
 
  --
 
 -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 





Re: How to index large set data

2009-05-22 Thread Jianbin Dai

I dont know exactly what is this 3G Ram buffer used. But what I noticed was 
both index size and file number were keeping increasing, but stuck in the 
commit. 

--- On Fri, 5/22/09, Otis Gospodnetic otis_gospodne...@yahoo..com wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Friday, May 22, 2009, 7:26 AM
 
 Hi,
 
 Those settings are a little crazy.  Are you sure you
 want to give Solr/Lucene 3G to buffer documents before
 flushing them to disk?  Are you sure you want to use
 the mergeFactor of 1000?  Checking the logs to see if
 there are any errors.  Look at the index directory to
 see if Solr is actually still writing to it? (file sizes are
 changing, number of files is changing).  kill -QUIT the
 JVM pid to see where things are stuck if they are
 stuck...
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Jianbin Dai djian...@yahoo.com
  To: solr-user@lucene.apache.org;
 noble.p...@gmail.com
  Sent: Friday, May 22, 2009 3:42:04 AM
  Subject: Re: How to index large set data
  
  
  about 2.8 m total docs were created. only the first
 run finishes. In my 2nd try, 
  it hangs there forever at the end of indexing, (I
 guess right before commit), 
  with cpu usage of 100%. Total 5G (2050) index files
 are created. Now I have two 
  problems:
  1. why it hangs there and failed?
  2. how can i speed up the indexing?
  
  
  Here is my solrconfig.xml
  
      false
      3000
      1000
      2147483647
      1
      false
  
  
  
  
  --- On Thu, 5/21/09, Noble Paul
 നോബിള്‍  नोब्ळ् wrote:
  
   From: Noble Paul നോബിള്‍ 
 नोब्ळ् 
   Subject: Re: How to index large set data
   To: solr-user@lucene.apache.org
   Date: Thursday, May 21, 2009, 10:39 PM
   what is the total no:of docs created
   ?  I guess it may not be memory
   bound. indexing is mostly amn IO bound operation.
 You may
   be able to
   get a better perf if a SSD is used (solid state
 disk)
   
   On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
   wrote:
   
Hi Paul,
   
Thank you so much for answering my
 questions. It
   really helped.
After some adjustment, basically setting
 mergeFactor
   to 1000 from the default value of 10, I can
 finished the
   whole job in 2.5 hours. I checked that during
 running time,
   only around 18% of memory is being used, and VIRT
 is always
   1418m. I am thinking it may be restricted by JVM
 memory
   setting. But I run the data import command
 through web,
   i.e.,
   
   http://:/solr/dataimport?command=full-import,
   how can I set the memory allocation for JVM?
Thanks again!
   
JB
   
--- On Thu, 5/21/09, Noble Paul
 നോബിള്‍
    नोब्ळ् 
   wrote:
   
From: Noble Paul നോബിള്‍
    नोब्ळ् 
Subject: Re: How to index large set
 data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009, 9:57 PM
check the status page of DIH and see
if it is working properly. and
if, yes what is the rate of indexing
   
On Thu, May 21, 2009 at 11:48 AM,
 Jianbin Dai
   
wrote:

 Hi,

 I have about 45GB xml files to be
 indexed. I
   am using
DataImportHandler. I started the full
 import 4
   hours ago,
and it's still running.
 My computer has 4GB memory. Any
 suggestion on
   the
solutions?
 Thanks!

 JB





   
   
   
--
   
  
 -
Noble Paul | Principal Engineer| AOL |
 http://aol.com
   
   
   
   
   
   
   
   
   
   -- 
  
 -
   Noble Paul | Principal Engineer| AOL | http://aol.com
   
 
 






Re: How to index large set data

2009-05-22 Thread Jianbin Dai

If I do the xml parsing by myself and use embedded client to do the push, would 
it be more efficient than DIH?


--- On Fri, 5/22/09, Grant Ingersoll gsing...@apache.org wrote:

 From: Grant Ingersoll gsing...@apache.org
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Friday, May 22, 2009, 5:38 AM
 Can you parallelize this?  I
 don't know that the DIH can handle it,  
 but having multiple threads sending docs to Solr is the
 best  
 performance wise, so maybe you need to look at alternatives
 to pulling  
 with DIH and instead use a client to push into Solr.
 
 
 On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
 
 
  about 2.8 m total docs were created. only the first
 run finishes. In  
  my 2nd try, it hangs there forever at the end of
 indexing, (I guess  
  right before commit), with cpu usage of 100%. Total 5G
 (2050) index  
  files are created. Now I have two problems:
  1. why it hangs there and failed?
  2. how can i speed up the indexing?
 
 
  Here is my solrconfig.xml
 
    
 useCompoundFilefalse/useCompoundFile
    
 ramBufferSizeMB3000/ramBufferSizeMB
    
 mergeFactor1000/mergeFactor
    
 maxMergeDocs2147483647/maxMergeDocs
    
 maxFieldLength1/maxFieldLength
    
 unlockOnStartupfalse/unlockOnStartup
 
 
 
 
  --- On Thu, 5/21/09, Noble Paul
 നോബിള്‍  नो 
  ब्ळ् noble.p...@corp.aol.com
 wrote:
 
  From: Noble Paul നോബിള്‍ 
 नोब्ळ्  
  noble.p...@corp.aol.com
  Subject: Re: How to index large set data
  To: solr-user@lucene.apache.org
  Date: Thursday, May 21, 2009, 10:39 PM
  what is the total no:of docs created
  ?  I guess it may not be memory
  bound. indexing is mostly amn IO bound operation.
 You may
  be able to
  get a better perf if a SSD is used (solid state
 disk)
 
  On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai
 djian...@yahoo.com
  wrote:
 
  Hi Paul,
 
  Thank you so much for answering my questions.
 It
  really helped.
  After some adjustment, basically setting
 mergeFactor
  to 1000 from the default value of 10, I can
 finished the
  whole job in 2.5 hours. I checked that during
 running time,
  only around 18% of memory is being used, and VIRT
 is always
  1418m. I am thinking it may be restricted by JVM
 memory
  setting. But I run the data import command through
 web,
  i.e.,
 
 
 http://host:port/solr/dataimport?command=full-import,
  how can I set the memory allocation for JVM?
  Thanks again!
 
  JB
 
  --- On Thu, 5/21/09, Noble Paul
 നോബിള്‍
   नोब्ळ् noble.p...@corp..aol.com
  wrote:
 
  From: Noble Paul നോബിള്‍
   नोब्ळ् noble.p...@corp.aol.com
  Subject: Re: How to index large set data
  To: solr-user@lucene.apache.org
  Date: Thursday, May 21, 2009, 9:57 PM
  check the status page of DIH and see
  if it is working properly. and
  if, yes what is the rate of indexing
 
  On Thu, May 21, 2009 at 11:48 AM, Jianbin
 Dai
  djian...@yahoo.com
  wrote:
 
  Hi,
 
  I have about 45GB xml files to be
 indexed. I
  am using
  DataImportHandler. I started the full
 import 4
  hours ago,
  and it's still running
  My computer has 4GB memory. Any
 suggestion on
  the
  solutions?
  Thanks!
 
  JB
 
 
 
 
 
 
 
 
  --
 
 
 -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 
 
 
 
 
  -- 
 
 -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem
 (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination..com/search
 
 






How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

I have an xml file like this 

merchantProduct id=814636051 mid=189973
in_stock type=stock-4 /
condition type=cond-0 /
price301.46/price
/merchantProduct

In the data-config.xml, I use
field column=pricexpath=/.../merchantProduct/price /

but how can I index id, mid?

Thanks.


  


Re: How to index large set data

2009-05-22 Thread Jianbin Dai

Hi Pual, but in your previous post, you said there is already an issue for 
writing to Solr in multiple threads  SOLR-1089. Do you think use solrj alone 
would be better than DIH? 
Thanks and have a good weekend!

--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 no need to use embedded Solrserver.
 you can use SolrJ with streaming
 in multiple threads
 
 On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  If I do the xml parsing by myself and use embedded
 client to do the push, would it be more efficient than DIH?
 
 
  --- On Fri, 5/22/09, Grant Ingersoll gsing...@apache.org
 wrote:
 
  From: Grant Ingersoll gsing...@apache.org
  Subject: Re: How to index large set data
  To: solr-user@lucene.apache.org
  Date: Friday, May 22, 2009, 5:38 AM
  Can you parallelize this?  I
  don't know that the DIH can handle it,
  but having multiple threads sending docs to Solr
 is the
  best
  performance wise, so maybe you need to look at
 alternatives
  to pulling
  with DIH and instead use a client to push into
 Solr.
 
 
  On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
 
  
   about 2.8 m total docs were created. only the
 first
  run finishes. In
   my 2nd try, it hangs there forever at the end
 of
  indexing, (I guess
   right before commit), with cpu usage of 100%.
 Total 5G
  (2050) index
   files are created. Now I have two problems:
   1. why it hangs there and failed?
   2. how can i speed up the indexing?
  
  
   Here is my solrconfig.xml
  
  
 
 useCompoundFilefalse/useCompoundFile
  
 
 ramBufferSizeMB3000/ramBufferSizeMB
  
  mergeFactor1000/mergeFactor
  
 
 maxMergeDocs2147483647/maxMergeDocs
  
 
 maxFieldLength1/maxFieldLength
  
 
 unlockOnStartupfalse/unlockOnStartup
  
  
  
  
   --- On Thu, 5/21/09, Noble Paul
  നോബിള്‍  नो
   ब्ळ् noble.p...@corp.aol.com
  wrote:
  
   From: Noble Paul നോബിള്‍
  नोब्ळ्
   noble.p...@corp.aol.com
   Subject: Re: How to index large set data
   To: solr-user@lucene.apache.org
   Date: Thursday, May 21, 2009, 10:39 PM
   what is the total no:of docs created
   ?  I guess it may not be memory
   bound. indexing is mostly amn IO bound
 operation.
  You may
   be able to
   get a better perf if a SSD is used (solid
 state
  disk)
  
   On Fri, May 22, 2009 at 10:46 AM, Jianbin
 Dai
  djian...@yahoo.com
   wrote:
  
   Hi Paul,
  
   Thank you so much for answering my
 questions.
  It
   really helped.
   After some adjustment, basically
 setting
  mergeFactor
   to 1000 from the default value of 10, I
 can
  finished the
   whole job in 2.5 hours. I checked that
 during
  running time,
   only around 18% of memory is being used,
 and VIRT
  is always
   1418m. I am thinking it may be restricted
 by JVM
  memory
   setting. But I run the data import
 command through
  web,
   i.e.,
  
  
 
 http://host:port/solr/dataimport?command=full-import,
   how can I set the memory allocation for
 JVM?
   Thanks again!
  
   JB
  
   --- On Thu, 5/21/09, Noble Paul
  നോബിള്‍
    नोब्ळ् noble.p...@corp..aol.com
   wrote:
  
   From: Noble Paul
 നോബിള്‍
    नोब्ळ् noble.p...@corp.aol.com
   Subject: Re: How to index large
 set data
   To: solr-u...@lucene.apache..org
   Date: Thursday, May 21, 2009,
 9:57 PM
   check the status page of DIH and
 see
   if it is working properly. and
   if, yes what is the rate of
 indexing
  
   On Thu, May 21, 2009 at 11:48 AM,
 Jianbin
  Dai
   djian...@yahoo.com
   wrote:
  
   Hi,
  
   I have about 45GB xml files
 to be
  indexed. I
   am using
   DataImportHandler. I started the
 full
  import 4
   hours ago,
   and it's still running.
   My computer has 4GB memory.
 Any
  suggestion on
   the
   solutions?
   Thanks!
  
   JB
  
  
  
  
  
  
  
  
   --
  
  
 
 -
   Noble Paul | Principal Engineer|
 AOL | http://aol.com
  
  
  
  
  
  
  
  
  
   --
  
 
 -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
  
  
  
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem
  (Lucene/Solr/Nutch/Mahout/Tika/Droids)
  using Solr/Lucene:
  http://www.lucidimagination...com/search
 
 
 
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 






Re: How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

Oh, I guess I didn't say it clearly in my post. 
I didn't use wild cards in xpath. My question was how to index attributes id 
and mid in the following xml file.

merchantProduct id=814636051 mid=189973
in_stock type=stock-4 /
condition type=cond-0 /
price301.46/price
/merchantProduct

In the data-config.xml, I use
field column=pricexpath=/merchantProduct/price /

but what are the xpath for id and mid?

Thanks again!





--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to use DIH to index attributes in xml file
 To: solr-user@lucene.apache.org
 Date: Friday, May 22, 2009, 9:03 PM
 wild cards are not supported . u must
 use full xpath
 
 On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  I have an xml file like this
 
  merchantProduct id=814636051 mid=189973
                     in_stock
 type=stock-4 /
                     condition
 type=cond-0 /
                   
  price301.46/price
  /merchantProduct
 
  In the data-config.xml, I use
  field column=price  
  xpath=/.../merchantProduct/price /
 
  but how can I index id, mid?
 
  Thanks.
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 






How to index large set data

2009-05-21 Thread Jianbin Dai

Hi,

I have about 45GB xml files to be indexed. I am using DataImportHandler. I 
started the full import 4 hours ago, and it's still running
My computer has 4GB memory. Any suggestion on the solutions?
Thanks!

JB


  



Re: How to index large set data

2009-05-21 Thread Jianbin Dai

Hi Paul,

Thank you so much for answering my questions. It really helped.
After some adjustment, basically setting mergeFactor to 1000 from the default 
value of 10, I can finished the whole job in 2.5 hours. I checked that during 
running time, only around 18% of memory is being used, and VIRT is always 
1418m. I am thinking it may be restricted by JVM memory setting. But I run the 
data import command through web, i.e.,
http://host:port/solr/dataimport?command=full-import, how can I set the 
memory allocation for JVM? 
Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Thursday, May 21, 2009, 9:57 PM
 check the status page of DIH and see
 if it is working properly. and
 if, yes what is the rate of indexing
 
 On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi,
 
  I have about 45GB xml files to be indexed. I am using
 DataImportHandler. I started the full import 4 hours ago,
 and it's still running
  My computer has 4GB memory. Any suggestion on the
 solutions?
  Thanks!
 
  JB
 
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 






Help needed on DataImportHandler to index xml files

2009-05-20 Thread Jianbin Dai

Hi All,
I am new here. Thanks for reading my question.
I want to use DataImportHandler to index my tons of xml files (7GB total) 
stored in my local disk. My data-config.xml is attached below. It works fine 
with one file (abc.xml), but how can I index all xml files at one time? Thanks!


dataConfig
dataSource type=FileDataSource /
document
entity name=example
url=/root/abc.xml
processor=XPathEntityProcessor
forEach=/ShopzillaQueryResponse/product
transformer=DateFormatTransformer

field column=id  
xpath=/ShopzillaQueryResponse/product/id /
field column=name
xpath=/ShopzillaQueryResponse/product/name /
field column=sku 
xpath=/ShopzillaQueryResponse/product/sku /
field column=mydescription  
xpath=/ShopzillaQueryResponse/product/desc_short /
field column=price  
xpath=/ShopzillaQueryResponse/product/merchantListing/merchantProduct/price /

/entity
/document
/dataConfig