Re: Index size - to determine storage
Hi Amit, This excel sheet will help you estimating the index size. size-estimator-lucene-solr.xls http://lucene.472066.n3.nabble.com/file/n4111365/size-estimator-lucene-solr.xls - Sumit Arora -- View this message in context: http://lucene.472066.n3.nabble.com/Index-size-to-determine-storage-tp4110522p4111365.html Sent from the Solr - User mailing list archive at Nabble.com.
MoreLikethis and fq not giving exact results ?
Hi All, I have provided identifications ,While submitting document to Solr e.g; jp_ for job posting , cp_ for career profile , and it stores id in a form of : jp_1, or jp_2 etc or cp_1 or cp_2 etc. So when I perform standard query with fq=cp_ , then its provide me the results belong to cp_ only or jp only. But when I enable mlt inside the query it returns the results for jp_ as well, because job_title also exist in job posting ( though jp_ or cp_ already differentiating to both of this ?) e.g; http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A *cp_4*start=0rows=100*fq=cp_* * * * * *How I can effectively use FilterQuery and MoreLikeThis ?* * * */Sumit* * * * * On Wed, Sep 1, 2010 at 8:38 PM, Sumit Arora sumit1...@gmail.com wrote: Hi All, I have provided identifications ,While submitting document to Solr e.g; jp_ for job posting , cp_ for career profile , and it stores id in a form of : jp_1, or jp_2 etc or cp_1 or cp_2 etc. So when I perform standard query with fq=cp_ , then its provide me the results belong to cp_ only or jp only. But when I enable mlt inside the query it returns the results for jp_ as well, because job_title also exist in job posting ( though jp_ or cp_ already differentiating to both of this ?) e.g; http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A *cp_4*start=0rows=100*fq=cp_* * * * * *How I can effectively use FilterQuery and MoreLikeThis ?* * * */Sumit* * * * *
MoreLikethis and fq not giving exact results ?
Hi All, I have provided identifications ,While submitting document to Solr e.g; jp_ for job posting , cp_ for career profile , and it stores id in a form of : jp_1, or jp_2 etc or cp_1 or cp_2 etc. So when I perform standard query with fq=cp_ , then its provide me the results belong to cp_ only or jp only. But when I enable mlt inside the query it returns the results for jp_ as well, because job_title also exist in job posting ( though jp_ or cp_ already differentiating to both of this ?) e.g; http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A *cp_4*start=0rows=100*fq=cp_* * * * * *How I can effectively use FilterQuery and MoreLikeThis ?* * * */Sumit* * * * *
How to do ? Articles and Its Associated Comments Indexing , One to Many relationship
I have set of Articles and then Comments on it, so in database I have two major tables one for Articles and one for Comments, but each Article could have many comments (One to Many). If One Article will have 20 Comments, then on DB to SOLR - Index - Sync : Solr will index 20 Similar Documents with a difference of each Comment. Use Case : On Search: If keyword would be a fit to more than one comment, then it will return duplicate documents. One Possible solution I thought to Apply: ** I should go for Indexing 20 Similar Documents with a difference of each Comment. While retrieving results from Query: I could use: collapse.field = By Article Id Am I following right approach?
Candidate Profile Search which have multiple employers and Educations.
I have to search candidate's profile , on which I have following Tables : Candidate Profile Record : CandidateProfile_Table CandidateEducation : CandidateEducation_Table // EducationIn Different Institutes or Colleges : Employers : Employers_Table //More than One Employers : If I denormalize this all three Table : CandidateProfile_Table - 1 Row for Sumit CandidateEducation_Table - 5 Rows for Sumit Employers_Table - 5 Rows for Sumit If these three tables will go to Index in Solr , It will create 25 Documents for one row. In this Case What Should be My Approach : DeNormalize all three tables and while querying from Solr use Field Collpase parameter by CandidateProfile Id, So It will return one record. Or I should use CandidateEducation_Table,CandidateEducation_Table as MultiValued in Solr ? If that is the case, then How I can apply Solr way to use MultiValue e.g; I need to use Following Configuration in Scehma.xml : field name=education type=textgen indexed=true stored=true/ field name=employer type=textgen indexed=true stored=true/ After this : I should pick all education values(from MySql Education Database Table) concerned to one profile and keep this in a one variable - EducationValuesForSolr and then EducationValuesForSolr's value need to assign to Schema.XML defined variable education ? Please let me know If I am using right approach and Comments? /Sumit
Re: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship
Thanks Ephraim for your response. If I use MultiValued for Comments Field then While Picking data from Solr, Should I use following Logic : /* Sample PseudoCode */ Get Rows from Article and Article-Comments Table ; *// It will retrieve - 1 Article and 20 Comments* Begin; Include 'Article Fields Value' in 'Solr Fields Value' Defined in Schema.Xml */* One Article in this Case, So it will generate one document id for Solr - */* Comments = 0; While (Comments ! = 20 ) { Include this Comment; ++Comments; } End; Result : One Article with MultipleComments as MultiValued indexed in Solr, Finally Solr will have only one document or multiple document ? If I suppose to use HighLight Text in this case, and Search - Keyword exist in more than one Comments ? How I can achieve below result where it has found 'web' keyword exist in two comments. ... 1.The *web* portal will connect a lot of people for some specific domain, and then people can post their interesting story, upload files ... 2.1 accessing multiple sites will slow down the user experience - try not to do it. *web* hosting is not too expensive as compared to the other components ... On Thu, Aug 26, 2010 at 4:32 PM, Ephraim Ofir ephra...@icq.com wrote: Why not define the comment field as multiValued? That way you only index each document once and you don't need to collapse anything... Ephraim Ofir -Original Message- From: Sumit Arora [mailto:sumit1...@gmail.com] Sent: Thursday, August 26, 2010 12:54 PM To: solr-user@lucene.apache.org Subject: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship I have set of Articles and then Comments on it, so in database I have two major tables one for Articles and one for Comments, but each Article could have many comments (One to Many). If One Article will have 20 Comments, then on DB to SOLR - Index - Sync : Solr will index 20 Similar Documents with a difference of each Comment. Use Case : On Search: If keyword would be a fit to more than one comment, then it will return duplicate documents. One Possible solution I thought to Apply: ** I should go for Indexing 20 Similar Documents with a difference of each Comment. While retrieving results from Query: I could use: collapse.field = By Article Id Am I following right approach?
Re: Candidate Profile Search which have multiple employers and Educations.
Thanks Ephraim for your response. Actually I am not using DIH to Sync the data from DB, I wrote on DB-SYNC by myself, and I am directly retrieving rows from MySQL-DB and Indexing to Solr. On my Earlier cases - I Picked Rows with Column Label from DB, and Similar Column Defined in my Sync Program, It Picks the data and Index it, One -to-One So in that case, One to Many - I have to use a inner-loop (Based on no. of Candidate Education or Candidate Employer) to Index (For Candidate Education and Candidate Employer ) On Thu, Aug 26, 2010 at 4:59 PM, Ephraim Ofir ephra...@icq.com wrote: As far as I can tell you should use multiValued for these fields: field name=education type=textgen indexed=true stored=true multiValued=true/ field name=employer type=textgen indexed=true stored=true multiValued=true/ In order to get the data from the DB you should either create a sub entity with its own query or (the better performance option) use something like: SELECT cp.name, GROUP_CONCAT(ce.CandidateEducation SEPARATOR '|') AS multiple_educations, GROUP_CONCAT(e.Employer SEPARATOR '|') AS multiple_employers FROM CandidateProfile_Table cp LEFT JOIN CandidateEducation_Table ce ON cp.name = ce.name LEFT JOIN Employers_Table e ON cp.name = e.name GROUP BY cp.name This creates one line with the educations and employers concatenated into pipe (|) delimited fields. Then you'd have to break up the multiple fields using a RegexTransformer - use something like: entity name=candidates query=...see above... transformer=RegexTransformer field name=education column=multiple_educations splitBy=\|/ field name=employer column=multiple_employers splitBy=\|/ /entity The SQL probably doesn't fit your DB schema, but it's just to clarify the idea. You might have to pick a different field separator if pipe (|) might be in your data... Ephraim Ofir -Original Message- From: Sumit Arora [mailto:sumit1...@gmail.com] Sent: Thursday, August 26, 2010 1:36 PM To: solr-user@lucene.apache.org Subject: Candidate Profile Search which have multiple employers and Educations. I have to search candidate's profile , on which I have following Tables : Candidate Profile Record : CandidateProfile_Table CandidateEducation : CandidateEducation_Table // EducationIn Different Institutes or Colleges : Employers : Employers_Table //More than One Employers : If I denormalize this all three Table : CandidateProfile_Table - 1 Row for Sumit CandidateEducation_Table - 5 Rows for Sumit Employers_Table - 5 Rows for Sumit If these three tables will go to Index in Solr , It will create 25 Documents for one row. In this Case What Should be My Approach : DeNormalize all three tables and while querying from Solr use Field Collpase parameter by CandidateProfile Id, So It will return one record. Or I should use CandidateEducation_Table,CandidateEducation_Table as MultiValued in Solr ? If that is the case, then How I can apply Solr way to use MultiValue e.g; I need to use Following Configuration in Scehma.xml : field name=education type=textgen indexed=true stored=true/ field name=employer type=textgen indexed=true stored=true/ After this : I should pick all education values(from MySql Education Database Table) concerned to one profile and keep this in a one variable - EducationValuesForSolr and then EducationValuesForSolr's value need to assign to Schema.XML defined variable education ? Please let me know If I am using right approach and Comments? /Sumit
Re: solr
Please follow guidelines from : http://lucene.apache.org/solr/tutorial.html http://lucene.apache.org/solr/tutorial.html/Sumit On Sat, Aug 21, 2010 at 11:25 AM, ankita shinde ankita.shind...@gmail.comwrote: Hello, I am new to solr. Can anyone please guide me how to install and use solr? Reply. -Ankita Shinde
How Solr Manages Connected Database Updates
Hey All, I am new to Solr Area, and just started exploring it and done basic stuff, now I am stuck with logic : How Solr Manages Connected Database Updates Scenario : -- Wrote one Indexing Program which runs on Tomcat , and by running this program, it reads data from connected MySql Database and then perform Indexing. Use Case - Database is not fixed, Its a data base for a web application, from where user keep on inserting data, so database have frequent updates. almost every minute. How automatically solr should grab those changes and perform Index updation ? Do I need to Write a Cron Job kind of stuff ? Or Use Data Import Handler ? (Several ways could be ?) Is there any one who can provide his comments or share his experience If some one gone though from similar situation ? Thanks, -Sumit