Re: Lucene index

2008-09-23 Thread Shalin Shekhar Mangar
On Tue, Sep 23, 2008 at 5:33 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
> Hi,
> Current we are using Lucene api to create index.
>
> It creates index in a directory with 3 files like
>
> xxx.cfs , deletable & segments.
>
> If I am creating Lucene indexes from Solr, these file will be created or
> not?


The lucene index will be created in the solr_home inside the data/index
directory.


> Please give me example on MySQL data base instead of hsqldb
>

If you are talking about DataImportHandler then there is no difference in
the configuration except for using the MySql driver instead of hsqldb.

-- 
Regards,
Shalin Shekhar Mangar.


RE: Lucene index

2008-09-23 Thread Dinesh Gupta
atalogues
doc.add(new Field("clg",(String) 
data.get("Catalogues"),Field.Store.YES,Field.Index.TOKENIZED));
//doc.add(Field.Text("clg", (String) data.get("Catalogues")));

//Product Delivery Cities
doc.add(new Field("dcty",(String) 
data.get("DelCities"),Field.Store.YES,Field.Index.TOKENIZED));
// Additional Information
//Top Selling Count
String sellerCount=((Long)data.get("SellCount")).toString();
doc.add(new 
Field("bsc",sellerCount,Field.Store.YES,Field.Index.TOKENIZED));


I am preparing data from querying databse.
Please tell me how can I migrate my logic to Solr.
I have spend more than a week.
But have got nothing.
Please help me.

Can I attach my files here?

Thanks in Advance

Regards
Dinesh Gupta

> Date: Tue, 23 Sep 2008 18:53:07 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene index
> 
> On Tue, Sep 23, 2008 at 5:33 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
> 
> >
> > Hi,
> > Current we are using Lucene api to create index.
> >
> > It creates index in a directory with 3 files like
> >
> > xxx.cfs , deletable & segments.
> >
> > If I am creating Lucene indexes from Solr, these file will be created or
> > not?
> 
> 
> The lucene index will be created in the solr_home inside the data/index
> directory.
> 
> 
> > Please give me example on MySQL data base instead of hsqldb
> >
> 
> If you are talking about DataImportHandler then there is no difference in
> the configuration except for using the MySql driver instead of hsqldb.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

_
Want to explore the world? Visit MSN Travel for the best deals.
http://in.msn.com/coxandkings

Re: Lucene index

2008-09-23 Thread Shalin Shekhar Mangar
ion"),
> Field.Store.NO,Field.Index.TOKENIZED));
>//doc.add(Field.UnStored("sdc", (String)
> data.get("SpecialDescription"),true));
>doc.add(new Field("kdc", (String) data.get("KeywordDescription"),
> Field.Store.NO,Field.Index.TOKENIZED));
>//doc.add(Field.UnStored("kdc", (String)
> data.get("KeywordDescription"),true));
>
>// ColumnB - Product Category and parent categories
>doc.add(new Field("cts",(String)
> data.get("Categories"),Field.Store.YES,Field.Index.TOKENIZED));
>//doc.add(Field.Text("cts", (String) data.get("Categories")));
>
>// ColumnB - Product Category and parent categories //Raman
>doc.add(new Field("dct",(String)
> data.get("DirectCategories"),Field.Store.YES,Field.Index.TOKENIZED));
>//doc.add(Field.Text("dct", (String) data.get("DirectCategories")));
>
>// ColumnC - Product Catalogues
>doc.add(new Field("clg",(String)
> data.get("Catalogues"),Field.Store.YES,Field.Index.TOKENIZED));
>//doc.add(Field.Text("clg", (String) data.get("Catalogues")));
>
>//Product Delivery Cities
>doc.add(new Field("dcty",(String)
> data.get("DelCities"),Field.Store.YES,Field.Index.TOKENIZED));
>// Additional Information
>//Top Selling Count
>String sellerCount=((Long)data.get("SellCount")).toString();
>doc.add(new
> Field("bsc",sellerCount,Field.Store.YES,Field.Index.TOKENIZED));
>
>
>I am preparing data from querying databse.
> Please tell me how can I migrate my logic to Solr.
> I have spend more than a week.
> But have got nothing.
> Please help me.
>
> Can I attach my files here?
>
> Thanks in Advance
>
> Regards
> Dinesh Gupta
>
> > Date: Tue, 23 Sep 2008 18:53:07 +0530
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lucene index
> >
> > On Tue, Sep 23, 2008 at 5:33 PM, Dinesh Gupta <
> [EMAIL PROTECTED]>wrote:
> >
> > >
> > > Hi,
> > > Current we are using Lucene api to create index.
> > >
> > > It creates index in a directory with 3 files like
> > >
> > > xxx.cfs , deletable & segments.
> > >
> > > If I am creating Lucene indexes from Solr, these file will be created
> or
> > > not?
> >
> >
> > The lucene index will be created in the solr_home inside the data/index
> > directory.
> >
> >
> > > Please give me example on MySQL data base instead of hsqldb
> > >
> >
> > If you are talking about DataImportHandler then there is no difference in
> > the configuration except for using the MySql driver instead of hsqldb.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
> _
> Want to explore the world? Visit MSN Travel for the best deals.
> http://in.msn.com/coxandkings
>



-- 
Regards,
Shalin Shekhar Mangar.


RE: Lucene index

2008-09-24 Thread Dinesh Gupta

Hi Shalin Shekhar,

 First of all thanks to you for quick replying.

  I have done the things that you have explained here

Since I am creating indexes in multi threads   and it takes 6-10 hours to 
creating for approx. 3 lac products

I am using hibernate to access DB & applying custom logic to prepare data and 
putting in a map
and finally writing to index.

Now can I achieve this.

I am able to search by using solr web admin
but not able to add.
Please tell me how can I attach my file to you.

Thanks

Regards,
Dinesh Gupta

> Date: Tue, 23 Sep 2008 19:36:22 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene index
> 
> Hi Dinesh,
> 
> This seems straightforward for Solr. You can use the embedded jetty server
> for a start. Look at the tutorial on how to get started.
> 
> You'll need to modify the schema.xml to define all the fields that you want
> to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a good
> start on how to do that. Each field in your code will have a counterpart in
> the schema.xml with appropriate flags (indexed/stored/tokenized etc.)
> 
> Once that is complete, try to modify the DataImportHandler's hsqldb example
> for your mysql database.
> 
> On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
> 
> >
> > Hi Shalin Shekhar,
> >
> > Let me explain my issue.
> >
> > I have some tables in my database like
> >
> > Product
> > Category
> > Catalogue
> > Keywords
> > Seller
> > Brand
> > Country_city_group
> > etc.
> > I have a class that represent  product document as
> >
> > Document doc = new Document();
> >// Keywords which can be used directly for search
> >doc.add(new Field("id",(String)
> > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >
> >// Sorting fields]
> >String priceString = (String) data.get("Price");
> >if (priceString == null)
> >priceString = "0";
> >long price = 0;
> >try {
> >price = (long) Double.parseDouble(priceString);
> >} catch (Exception e) {
> >
> >}
> >
> >doc.add(new
> > Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >Date createDate = (Date) data.get("CreateDate");
> >if (createDate == null) createDate = new Date();
> >
> >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >
> >Date modiDate = (Date) data.get("ModiDate");
> >if (modiDate == null) modiDate = new Date();
> >
> >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.UnStored("cdt",
> > String.valueOf(createDate.getTime(;
> >
> >// Additional fields for search
> >doc.add(new Field("bnm",(String)
> > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> >doc.add(new Field("bnm1",(String) data.get("Brand1"),Field.Store.NO
> > ,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> > //Tokenized and Unstored
> >doc.add(new Field("bid",(String)
> > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("bid", (String) data.get("BrandId"))); //
> > untokenized &
> >doc.add(new Field("grp",(String) data.get("Group"),Field.Store.NO
> > ,Field.Index.TOKENIZED));
> >//doc.add(Field.Text("grp", (String) data.get("Group")));
> >doc.add(new Field("gid",(String)
> > data.get("GroupId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("gid", (String) data.get("GroupId"))); //New
> >doc.add(new Field("snm",(String)
> > data.get("Seller"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("snm", (String) data.get("Seller")));
> >doc.add(new Field("sid",(String)
> > data.get("SellerId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("sid", (String) data.get("SellerId"))); //
> > New

Re: Lucene index

2008-09-24 Thread Shalin Shekhar Mangar
Hi Dinesh,

There are two ways in which you can import data from databases.

1. Use your custom code with the Solrj client library to upload documents to
Solr -- http://wiki.apache.org/solr/Solrj
2. Use DataImportHandler and write data-config.xml and custom Transformers
-- http://wiki.apache.org/solr/DataImportHandler

Take a look at both and use the one which suits you best.

On Wed, Sep 24, 2008 at 6:37 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
> Hi Shalin Shekhar,
>
>  First of all thanks to you for quick replying.
>
>  I have done the things that you have explained here
>
> Since I am creating indexes in multi threads   and it takes 6-10 hours to
> creating for approx. 3 lac products
>
> I am using hibernate to access DB & applying custom logic to prepare data
> and putting in a map
> and finally writing to index.
>
> Now can I achieve this.
>
> I am able to search by using solr web admin
> but not able to add.
> Please tell me how can I attach my file to you.
>
> Thanks
>
> Regards,
> Dinesh Gupta
>
> > Date: Tue, 23 Sep 2008 19:36:22 +0530
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lucene index
> >
> > Hi Dinesh,
> >
> > This seems straightforward for Solr. You can use the embedded jetty
> server
> > for a start. Look at the tutorial on how to get started.
> >
> > You'll need to modify the schema.xml to define all the fields that you
> want
> > to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a
> good
> > start on how to do that. Each field in your code will have a counterpart
> in
> > the schema.xml with appropriate flags (indexed/stored/tokenized etc.)
> >
> > Once that is complete, try to modify the DataImportHandler's hsqldb
> example
> > for your mysql database.
> >
> > On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <
> [EMAIL PROTECTED]>wrote:
> >
> > >
> > > Hi Shalin Shekhar,
> > >
> > > Let me explain my issue.
> > >
> > > I have some tables in my database like
> > >
> > > Product
> > > Category
> > > Catalogue
> > > Keywords
> > > Seller
> > > Brand
> > > Country_city_group
> > > etc.
> > > I have a class that represent  product document as
> > >
> > > Document doc = new Document();
> > >// Keywords which can be used directly for search
> > >doc.add(new Field("id",(String)
> > > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >
> > >// Sorting fields]
> > >String priceString = (String) data.get("Price");
> > >if (priceString == null)
> > >priceString = "0";
> > >long price = 0;
> > >try {
> > >price = (long) Double.parseDouble(priceString);
> > >} catch (Exception e) {
> > >
> > >}
> > >
> > >doc.add(new
> > >
> Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >Date createDate = (Date) data.get("CreateDate");
> > >if (createDate == null) createDate = new Date();
> > >
> > >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >
> > >Date modiDate = (Date) data.get("ModiDate");
> > >if (modiDate == null) modiDate = new Date();
> > >
> > >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.UnStored("cdt",
> > > String.valueOf(createDate.getTime(;
> > >
> > >// Additional fields for search
> > >doc.add(new Field("bnm",(String)
> > > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> > >doc.add(new Field("bnm1",(String) data.get("Brand1"),
> Field.Store.NO
> > > ,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> > > //Tokenized and Unstored
> > >doc.add(new Field("bid",(String)
> > > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Keyword("bid", (String) data.get("BrandId")));
> //
&g

Re: Lucene index verifier

2008-02-08 Thread Yonik Seeley
If someone wanted those additional checks, it seems like the right
place to hook it in would be the snapshooter or snapinstaller.

-Yonik

On Feb 8, 2008 8:04 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I think Mike M. put up a tool called CheckIndex that is a simple
> driver program that checks for corruption.  However, my understanding
> is that he isn't sure it is complete just yet, but it is a start.
> Have a look in the latest release.
>
> Maybe it would be useful to have it run either on startup or
> periodically in Solr (if configured to do so).  I haven't tried it, so
> I don't know what effect it has on performance/search/indexing.
>
> -Grant
>
>
> On Feb 7, 2008, at 11:15 PM, Lance Norskog wrote:
>
> > (Sorry, my Lucene java-user access is wonky.)
> >
> > I would like to verify that my snapshots are not corrupt before I
> > enable
> > them.
> >
> > What is the simplest program to verify that a Lucene index is not
> > corrupt?
> >
> > Or, what is a Solr query that will verify that there is no
> > corruption? With
> > the minimum amount of time?
> >
> > Thanks,
> >
> > Lance Norskog
>
>
>


RE: Lucene index verifier

2008-02-08 Thread Lance Norskog
Given the size of our index, using file checksums is more feasible. 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Friday, February 08, 2008 5:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene index verifier

If someone wanted those additional checks, it seems like the right place to
hook it in would be the snapshooter or snapinstaller.

-Yonik

On Feb 8, 2008 8:04 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I think Mike M. put up a tool called CheckIndex that is a simple 
> driver program that checks for corruption.  However, my understanding 
> is that he isn't sure it is complete just yet, but it is a start.
> Have a look in the latest release.
>
> Maybe it would be useful to have it run either on startup or 
> periodically in Solr (if configured to do so).  I haven't tried it, so 
> I don't know what effect it has on performance/search/indexing.
>
> -Grant
>
>
> On Feb 7, 2008, at 11:15 PM, Lance Norskog wrote:
>
> > (Sorry, my Lucene java-user access is wonky.)
> >
> > I would like to verify that my snapshots are not corrupt before I 
> > enable them.
> >
> > What is the simplest program to verify that a Lucene index is not 
> > corrupt?
> >
> > Or, what is a Solr query that will verify that there is no 
> > corruption? With the minimum amount of time?
> >
> > Thanks,
> >
> > Lance Norskog
>
>
>



Re: Lucene index verifier

2008-02-08 Thread Grant Ingersoll
I think Mike M. put up a tool called CheckIndex that is a simple  
driver program that checks for corruption.  However, my understanding  
is that he isn't sure it is complete just yet, but it is a start.   
Have a look in the latest release.


Maybe it would be useful to have it run either on startup or  
periodically in Solr (if configured to do so).  I haven't tried it, so  
I don't know what effect it has on performance/search/indexing.


-Grant

On Feb 7, 2008, at 11:15 PM, Lance Norskog wrote:


(Sorry, my Lucene java-user access is wonky.)

I would like to verify that my snapshots are not corrupt before I  
enable

them.

What is the simplest program to verify that a Lucene index is not  
corrupt?


Or, what is a Solr query that will verify that there is no  
corruption? With

the minimum amount of time?

Thanks,

Lance Norskog