Re: Index size - to determine storage

2014-01-14 Thread Sumit Arora
Hi Amit,

This excel sheet will help you estimating the index size.

size-estimator-lucene-solr.xls
http://lucene.472066.n3.nabble.com/file/n4111365/size-estimator-lucene-solr.xls
  




-
Sumit Arora
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-size-to-determine-storage-tp4110522p4111365.html
Sent from the Solr - User mailing list archive at Nabble.com.


MoreLikethis and fq not giving exact results ?

2010-09-02 Thread Sumit Arora

 Hi All,

  I have provided identifications ,While submitting document to Solr e.g;
 jp_ for job posting , cp_ for career profile , and it stores id in a form of
 : jp_1, or jp_2 etc or cp_1 or cp_2 etc.

  So when I perform standard query with fq=cp_ , then its provide me the
 results belong to cp_ only or jp only.

  But when I enable mlt inside the query it returns the results for jp_ as
 well, because job_title also exist in job posting ( though jp_ or cp_
 already differentiating to both of this ?)

 e.g;


 http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A
 *cp_4*start=0rows=100*fq=cp_*
 *
 *
 *
 *
 *How I can effectively use FilterQuery and MoreLikeThis ?*
 *
 *
 */Sumit*
 *
 *
 *
 *




On Wed, Sep 1, 2010 at 8:38 PM, Sumit Arora sumit1...@gmail.com wrote:

 Hi All,

  I have provided identifications ,While submitting document to Solr e.g;
 jp_ for job posting , cp_ for career profile , and it stores id in a form of
 : jp_1, or jp_2 etc or cp_1 or cp_2 etc.

  So when I perform standard query with fq=cp_ , then its provide me the
 results belong to cp_ only or jp only.

  But when I enable mlt inside the query it returns the results for jp_ as
 well, because job_title also exist in job posting ( though jp_ or cp_
 already differentiating to both of this ?)

 e.g;


 http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A
 *cp_4*start=0rows=100*fq=cp_*
 *
 *
 *
 *
 *How I can effectively use FilterQuery and MoreLikeThis ?*
 *
 *
 */Sumit*
 *
 *
 *
 *



MoreLikethis and fq not giving exact results ?

2010-09-01 Thread Sumit Arora
Hi All,

 I have provided identifications ,While submitting document to Solr e.g; jp_
for job posting , cp_ for career profile , and it stores id in a form of :
jp_1, or jp_2 etc or cp_1 or cp_2 etc.

 So when I perform standard query with fq=cp_ , then its provide me the
results belong to cp_ only or jp only.

 But when I enable mlt inside the query it returns the results for jp_ as
well, because job_title also exist in job posting ( though jp_ or cp_
already differentiating to both of this ?)

e.g;

http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A
*cp_4*start=0rows=100*fq=cp_*
*
*
*
*
*How I can effectively use FilterQuery and MoreLikeThis ?*
*
*
*/Sumit*
*
*
*
*


How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

2010-08-26 Thread Sumit Arora
I have set of Articles and then Comments on it, so in database I have two
major tables one for Articles and one for Comments, but each Article could
have many comments (One to Many).


If One Article will have 20 Comments, then on DB to SOLR - Index - Sync :
Solr will index 20 Similar Documents with a difference of each Comment.


Use Case :

On Search: If keyword would be a fit to more than one comment, then it will
return duplicate documents.


One Possible solution I thought to Apply:

**

I should go for Indexing 20 Similar Documents with a difference of each
Comment.


While retrieving results from Query: I could use: collapse.field = By
Article Id


Am I following right approach?


Candidate Profile Search which have multiple employers and Educations.

2010-08-26 Thread Sumit Arora
I have to search candidate's profile , on which I have following Tables :

Candidate Profile Record : CandidateProfile_Table

CandidateEducation : CandidateEducation_Table  //  EducationIn Different
Institutes or Colleges  :

Employers :  Employers_Table //More than One Employers :

If I denormalize this all three Table :

CandidateProfile_Table  - 1 Row for Sumit

CandidateEducation_Table - 5 Rows for Sumit

Employers_Table - 5 Rows for Sumit

If these three tables will go to Index in Solr , It will create 25 Documents
for one row.


In this Case What Should be My Approach :

DeNormalize all three tables and while querying from Solr use Field Collpase
parameter by CandidateProfile Id, So It will return one record.

Or

I should use CandidateEducation_Table,CandidateEducation_Table as
MultiValued in Solr ?


If that is the case, then How I can apply Solr way to use MultiValue e.g;

I need to use  Following Configuration in Scehma.xml :


  field name=education type=textgen indexed=true stored=true/
  field name=employer type=textgen indexed=true stored=true/


After this :


I should pick all education values(from MySql Education Database Table)
concerned to one profile

and keep this in a one variable - EducationValuesForSolr

and then EducationValuesForSolr's value need to assign to Schema.XML defined
variable education ?


Please let me know If I am using right approach and Comments?

/Sumit


Re: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

2010-08-26 Thread Sumit Arora
Thanks Ephraim for your response.

If I use MultiValued for Comments Field then While Picking data from Solr,
Should I use following Logic :

/*  Sample PseudoCode */

Get Rows from Article and Article-Comments Table ;  *// It will retrieve - 1
Article and 20 Comments*

Begin;

Include 'Article Fields Value' in 'Solr Fields Value' Defined in Schema.Xml
 */* One Article in this Case, So it will generate one document id for Solr
- */*

Comments = 0;

While (Comments ! = 20 )

{
   Include this Comment;

   ++Comments;
}

End;

Result : One Article with MultipleComments as MultiValued indexed in Solr,
Finally Solr will have only one document or multiple document ?

If I suppose to use HighLight Text in this case, and Search - Keyword exist
in more than one Comments ? How I can achieve below result where it has
found 'web' keyword exist in two comments.

... 1.The *web* portal will connect a lot of people for some specific
domain, and then people can post their interesting story, upload files

 ... 2.1 accessing multiple sites will slow down the user experience - try
not to do it. *web* hosting is not too expensive as compared to the other
components ...




On Thu, Aug 26, 2010 at 4:32 PM, Ephraim Ofir ephra...@icq.com wrote:

 Why not define the comment field as multiValued? That way you only index
 each document once and you don't need to collapse anything...

 Ephraim Ofir


 -Original Message-
 From: Sumit Arora [mailto:sumit1...@gmail.com]
 Sent: Thursday, August 26, 2010 12:54 PM
 To: solr-user@lucene.apache.org
 Subject: How to do ? Articles and Its Associated Comments Indexing , One
 to Many relationship

 I have set of Articles and then Comments on it, so in database I have
 two
 major tables one for Articles and one for Comments, but each Article
 could
 have many comments (One to Many).


 If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
 :
 Solr will index 20 Similar Documents with a difference of each Comment.


 Use Case :

 On Search: If keyword would be a fit to more than one comment, then it
 will
 return duplicate documents.


 One Possible solution I thought to Apply:

 **

 I should go for Indexing 20 Similar Documents with a difference of each
 Comment.


 While retrieving results from Query: I could use: collapse.field = By
 Article Id


 Am I following right approach?



Re: Candidate Profile Search which have multiple employers and Educations.

2010-08-26 Thread Sumit Arora
Thanks Ephraim for your response.

Actually I am not using DIH to Sync the data from DB, I wrote on DB-SYNC by
myself, and I am directly retrieving rows from MySQL-DB and Indexing to
Solr.

On my Earlier cases - I Picked Rows with Column Label from DB, and Similar
Column Defined in my Sync Program, It Picks the data and Index it, One
-to-One



So in that case, One to Many -  I have to use a inner-loop (Based on no. of
Candidate Education or Candidate Employer) to Index  (For
Candidate Education and Candidate Employer )

On Thu, Aug 26, 2010 at 4:59 PM, Ephraim Ofir ephra...@icq.com wrote:

 As far as I can tell you should use multiValued for these fields:

  field name=education type=textgen indexed=true stored=true
 multiValued=true/
   field name=employer type=textgen indexed=true stored=true
 multiValued=true/

 In order to get the data from the DB you should either create a sub
 entity with its own query or (the better performance option) use
 something like:

 SELECT cp.name,
GROUP_CONCAT(ce.CandidateEducation SEPARATOR '|') AS
 multiple_educations,
GROUP_CONCAT(e.Employer SEPARATOR '|') AS multiple_employers
 FROM CandidateProfile_Table cp
 LEFT JOIN CandidateEducation_Table ce ON cp.name = ce.name
 LEFT JOIN Employers_Table e ON cp.name = e.name
 GROUP BY cp.name

 This creates one line with the educations and employers concatenated
 into pipe (|) delimited fields.  Then you'd have to break up the
 multiple fields using a RegexTransformer - use something like:

 entity name=candidates
query=...see above...
transformer=RegexTransformer 

field name=education column=multiple_educations
 splitBy=\|/
field name=employer column=multiple_employers
 splitBy=\|/
 /entity

 The SQL probably doesn't fit your DB schema, but it's just to clarify
 the idea.  You might have to pick a different field separator if pipe
 (|) might be in your data...

 Ephraim Ofir


 -Original Message-
 From: Sumit Arora [mailto:sumit1...@gmail.com]
 Sent: Thursday, August 26, 2010 1:36 PM
 To: solr-user@lucene.apache.org
 Subject: Candidate Profile Search which have multiple employers and
 Educations.

 I have to search candidate's profile , on which I have following Tables
 :

 Candidate Profile Record : CandidateProfile_Table

 CandidateEducation : CandidateEducation_Table  //  EducationIn Different
 Institutes or Colleges  :

 Employers :  Employers_Table //More than One Employers :

 If I denormalize this all three Table :

 CandidateProfile_Table  - 1 Row for Sumit

 CandidateEducation_Table - 5 Rows for Sumit

 Employers_Table - 5 Rows for Sumit

 If these three tables will go to Index in Solr , It will create 25
 Documents
 for one row.


 In this Case What Should be My Approach :

 DeNormalize all three tables and while querying from Solr use Field
 Collpase
 parameter by CandidateProfile Id, So It will return one record.

 Or

 I should use CandidateEducation_Table,CandidateEducation_Table as
 MultiValued in Solr ?


 If that is the case, then How I can apply Solr way to use MultiValue
 e.g;

 I need to use  Following Configuration in Scehma.xml :


  field name=education type=textgen indexed=true stored=true/
  field name=employer type=textgen indexed=true stored=true/


 After this :


 I should pick all education values(from MySql Education Database Table)
 concerned to one profile

 and keep this in a one variable - EducationValuesForSolr

 and then EducationValuesForSolr's value need to assign to Schema.XML
 defined
 variable education ?


 Please let me know If I am using right approach and Comments?

 /Sumit



Re: solr

2010-08-21 Thread Sumit Arora
Please follow guidelines from :

http://lucene.apache.org/solr/tutorial.html

http://lucene.apache.org/solr/tutorial.html/Sumit

On Sat, Aug 21, 2010 at 11:25 AM, ankita shinde
ankita.shind...@gmail.comwrote:

 Hello,
 I am new to solr.
 Can anyone please guide me how to install and use solr?
 Reply.
 -Ankita Shinde



How Solr Manages Connected Database Updates

2010-06-09 Thread Sumit Arora
Hey All,

I am new to Solr Area, and just started exploring it and done basic stuff,
now I am stuck with logic :

How Solr Manages Connected Database Updates

Scenario :

-- Wrote one Indexing Program which runs on Tomcat , and by running this
program, it reads  data from connected MySql Database and then perform
Indexing.

Use Case - Database is not fixed, Its a data base for a web application,
from where user keep on inserting data, so database have frequent updates.
almost every minute.

How automatically solr should grab those changes and perform Index updation
?


Do I need to Write a Cron Job kind of stuff ? Or Use Data Import Handler ?
(Several ways could be ?)

Is there any one who can provide his comments or share his experience If
some one gone though from similar situation ?

Thanks,
-Sumit