Re: Multiple Schema File

2008-06-04 Thread climbingrose
Hi Sachit,

I think what you could do is to create all the "core fields" of your models
such as username, role, title, body, images... You can name them with prefix
like user.username, user.role, article.title, article.body... If you want to
dynamically add more fields to your schema, you can use dynamic fields and
keep a mapping between your model's properties and these fields somewhere.
Have a look at the default schema.xml for examples. I did use this approach
for a previous project and it worked fine for me.

Cheers,
Cuong

On Thu, Jun 5, 2008 at 3:43 PM, Sachit P. Menon <[EMAIL PROTECTED]>
wrote:

> Hi folks,
>
>
>
> I have a scenario as follows:
>
>
>
> I have a CMS where in I'm storing all the contents. I need to index all
> these
> contents and have a search on these indexes. For indexing, I can define a
> schema for all the contents. Some of the properties are like title,
> headline,
> body, keywords, images, etc.
>
> Now I have a user management wherein I store all the user information. I
> need
> to index this also. This may have properties like user name, role, joining
> date, etc.
>
>
>
> I want to use only one Solr instance. That means I can have only one schema
> file.
>
> How can I define all these totally different properties in one schema file?
>
> The unique id storage for content and user management may also be
> different.
> How can I achieve this?
>
>
>
>
>
> Thanks and Regards
>
> Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1,
> Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91
> 80 26264000 |Extn  64872|Fax +91 80 26264100 | Mob : +91
> 9986747356|www.mindtree.com
> <
> https://indiamail.mindtree.com/exchweb/bin/redir.asp?URL=http://www.mindtree
> .com/>  |
>
>
>
>
>
> DISCLAIMER:
> This message (including attachment if any) is confidential and may be
> privileged. If you have received this message by mistake please notify the
> sender by return e-mail and delete this message from your system. Any
> unauthorized use or dissemination of this message in whole or in part is
> strictly prohibited.
> E-mail may contain viruses. Before opening attachments please check them
> for viruses and defects. While MindTree Limited (MindTree) has put in place
> checks to minimize the risks, MindTree will not be responsible for any
> viruses or defects or any forwarded attachments emanating either from within
> MindTree or outside.
> Please note that e-mails are susceptible to change and MindTree shall not
> be liable for any improper, untimely or incomplete transmission.
> MindTree reserves the right to monitor and review the content of all
> messages sent to or from MindTree e-mail address. Messages sent to or from
> this e-mail address may be stored on the MindTree e-mail system or else
> where.
>



-- 
Regards,

Cuong Hoang


Re: Multiple Schema File

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi ,
use multi-core Solr. Each core can have its own schema.

If possible the DISCLAIMER can be dropped.
--Noble

On Thu, Jun 5, 2008 at 11:13 AM, Sachit P. Menon
<[EMAIL PROTECTED]> wrote:
> Hi folks,
>
>
>
> I have a scenario as follows:
>
>
>
> I have a CMS where in I'm storing all the contents. I need to index all these
> contents and have a search on these indexes. For indexing, I can define a
> schema for all the contents. Some of the properties are like title, headline,
> body, keywords, images, etc.
>
> Now I have a user management wherein I store all the user information. I need
> to index this also. This may have properties like user name, role, joining
> date, etc.
>
>
>
> I want to use only one Solr instance. That means I can have only one schema
> file.
>
> How can I define all these totally different properties in one schema file?
>
> The unique id storage for content and user management may also be different.
> How can I achieve this?
>
>
>
>
>
> Thanks and Regards
>
> Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1,
> Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91
> 80 26264000 |Extn  64872|Fax +91 80 26264100 | Mob : +91
> 9986747356|www.mindtree.com
>  .com/>  |
>
>
>
>
>
> DISCLAIMER:
> This message (including attachment if any) is confidential and may be 
> privileged. If you have received this message by mistake please notify the 
> sender by return e-mail and delete this message from your system. Any 
> unauthorized use or dissemination of this message in whole or in part is 
> strictly prohibited.
> E-mail may contain viruses. Before opening attachments please check them for 
> viruses and defects. While MindTree Limited (MindTree) has put in place 
> checks to minimize the risks, MindTree will not be responsible for any 
> viruses or defects or any forwarded attachments emanating either from within 
> MindTree or outside.
> Please note that e-mails are susceptible to change and MindTree shall not be 
> liable for any improper, untimely or incomplete transmission.
> MindTree reserves the right to monitor and review the content of all messages 
> sent to or from MindTree e-mail address. Messages sent to or from this e-mail 
> address may be stored on the MindTree e-mail system or else where.
>



-- 
--Noble Paul


Multiple Schema File

2008-06-04 Thread Sachit P. Menon
Hi folks,

 

I have a scenario as follows: 

 

I have a CMS where in I'm storing all the contents. I need to index all these
contents and have a search on these indexes. For indexing, I can define a
schema for all the contents. Some of the properties are like title, headline,
body, keywords, images, etc. 

Now I have a user management wherein I store all the user information. I need
to index this also. This may have properties like user name, role, joining
date, etc. 

 

I want to use only one Solr instance. That means I can have only one schema
file. 

How can I define all these totally different properties in one schema file? 

The unique id storage for content and user management may also be different.
How can I achieve this? 

 

 

Thanks and Regards

Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1,
Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91
80 26264000 |Extn  64872|Fax +91 80 26264100 | Mob : +91
9986747356|www.mindtree.com
  |

 



DISCLAIMER:
This message (including attachment if any) is confidential and may be 
privileged. If you have received this message by mistake please notify the 
sender by return e-mail and delete this message from your system. Any 
unauthorized use or dissemination of this message in whole or in part is 
strictly prohibited.
E-mail may contain viruses. Before opening attachments please check them for 
viruses and defects. While MindTree Limited (MindTree) has put in place checks 
to minimize the risks, MindTree will not be responsible for any viruses or 
defects or any forwarded attachments emanating either from within MindTree or 
outside.
Please note that e-mails are susceptible to change and MindTree shall not be 
liable for any improper, untimely or incomplete transmission.
MindTree reserves the right to monitor and review the content of all messages 
sent to or from MindTree e-mail address. Messages sent to or from this e-mail 
address may be stored on the MindTree e-mail system or else where.


Re: HttpDataSource common fields

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
attachment did not work  try this

http://www.nabble.com/Re%3A-How-to-describe-2-entities-in-dataConfig-for-the-DataImporter--p17577610.html
--Noble

On Thu, Jun 5, 2008 at 9:37 AM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> commonField="true" can be added in any field when you are using an
> XPathEntityProcessor.But you will never need to do so because only xml
> has such a requirement. If you wish to add a string literal use a
> TemplateTransformer and keep the  template="my-string-literal"/>
> The one in the patch has a bug use the attached one.
>
> http://wiki.apache.org/solr/DataImportHandler#head-071ff018f44ecbdb1cf55afc4c2a857f44ea1ea4
>
> --Noble
>
>
>
> On Thu, Jun 5, 2008 at 3:46 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I have a question about the HttpDataSource (DataImportHandler) ...
>>
>> Is it possible add common values *explicitly*, something like:
>>
>> 
>>
>> Im blanking on if xpath has a command / option to return back just a string
>> literal (vs. node).
>>
>> Thanks.
>>
>> - Jon
>>
>>
>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul


Re: HttpDataSource common fields

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
commonField="true" can be added in any field when you are using an
XPathEntityProcessor.But you will never need to do so because only xml
has such a requirement. If you wish to add a string literal use a
TemplateTransformer and keep the 
The one in the patch has a bug use the attached one.

http://wiki.apache.org/solr/DataImportHandler#head-071ff018f44ecbdb1cf55afc4c2a857f44ea1ea4

--Noble



On Thu, Jun 5, 2008 at 3:46 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a question about the HttpDataSource (DataImportHandler) ...
>
> Is it possible add common values *explicitly*, something like:
>
> 
>
> Im blanking on if xpath has a command / option to return back just a string
> literal (vs. node).
>
> Thanks.
>
> - Jon
>
>
>
>



-- 
--Noble Paul


Re: POSTing repeated fields to Solr

2008-06-04 Thread Mike Klaas

On 4-Jun-08, at 2:22 PM, Andrew Nagy wrote:

Hello - I was wondering if there is a work around with POSTing  
repeated fields to Solr.  I am using Jetty as my container with Solr  
1.2.


I tried something like:
http://localhost:8080/solr/select/?q=author:(smith)&rows=0&start=0&facet=true&facet.mincount=1&facet.limit=10&facet.field=authorlast&facet.field=authorfirst

I am only getting back facets from the last facet.field.  With a GET  
request I get back all of the facet fields.  I am assuming this is a  
limitation with Jetty?  How are others doing this?


It could be, but I would be surprised.  How are you constructing the  
POST request  (you listed a url, which only makes sense as a GET)?


-Mike


Re: ClassCastException trying to use distributed search

2008-06-04 Thread Chris Hostetter

: Hoss:Your are right. It has a version byte written first. This can be
: used for any changes that come later..So , when we introduce any

Cool .. i just wanted to make sure the issue here was the defualt format
changing from XML to binary as part of the development cycle and not an 
expected "upgrade" concern that we'd have to worry about with ever future 
tweak to the format.



-Hoss



Re: new user: some questions about parameters and query syntax

2008-06-04 Thread Chris Hostetter

: that for a newbie, there is a certain confusion when it comes to "one
: thing with comma a separated list, the other can have multiple
: values". This could also be solved perfectly in the wiki (i.e. making
: it very clear multiple GET params with the same name can be used, or
: if a comma-separated list should be used).

We've certainly tried to be consistent about this.  Any param that 
"parses" the value you give it should say so on the wiki ("sort" goes into 
a lot of detail on this, as does "fl" ... things like "qf" forthe 
DismaxHandler could probably be more explicit about spliting on whitespace 
instead of just giving an example).  Any param that can be specified 
multiple times should say "This parameter can be specified multiple 
times..." and explain how multiple values will be interpreted.

If you notice any inconsistencies or vagueness in the wiki, feel free to 
clean it up (if you don't know what the right thing to write is, just ask) 
... I've said it many times in the past: "Experts" tend to be the worst 
documentation writters for new users, because they are too aware of how 
things work to really understand what type of information is most needed 
(or already obvious) to new users.

*Hopefully* I'll get some time tomorow to make progress on SOLR-555 and we 
won't have ambiguious wiki pages for plugins with inconsistently formated 
sections on each param -- we can start having more explict structured info 
about all of the concepts that are really "core" to how a param works (ie: 
name, datatype, is it multivalued, can it be overridden per field, what 
is/are the value(s) used for).  Of course: the "structured" documentation 
built from the code would still link to (or dare i say: iframe include?) 
wiki pages where users can add notes, comments, and additional exampless 
of configuration and use.  (much like the PHP documentation, which really 
setthe bar for integrating "refrence" documentation with user contributed 
adendums)




-Hoss



Re: new user: some questions about parameters and query syntax

2008-06-04 Thread Chris Hostetter

: I dunno...  for something like "fl", it still seems a bit verbose to
: list every field separately.
: Some of these things feel like trade offs in ease of readability &
: manually typing of URLs vs ease of programmatic manipulation.

I don't even think the primary issue is programmatic manipulation ... it's 
ease of readability and manual typing vs a need for external mapping of 
internal field names to external field labels.  even the simplest use 
cases of Solr almost allways need some external mapping from field labels 
they display to end users and field names that are used in the schema ... 
it would be really nice if that wasn't actually neccessary.

just making fl a multivalued field wouldn't get you far enough obviously, 
you'd also need some aliasing mechanism to deal with field names 
for sorting vs field names for quering, and in many cases you have to 
worry about whether your field names contain characters the 
standard QueryParser considers special (like whitespace) but it would 
certainly be nice to say...
  q=*:*&fl=Product+Id&fl=Product+Name&fl=Sales+Price





-Hoss



Re: what is null value behavior in function queries?

2008-06-04 Thread Chris Hostetter

: I am using function queries to rank the results,
: if some/ allthe fields (used in the function ) are missing from the document
: what will be the ranking behavior for such documents?

Off the top of my head, I believe it's zero, but an easy way to check is 
to run a simple linear function query with no offset and a slope of 1 
(esentially an identity fucntion) against a doc with no value for that 
field and see what you get.


-Hoss



HttpDataSource common fields

2008-06-04 Thread Jon Baer

Hi,

I have a question about the HttpDataSource (DataImportHandler) ...

Is it possible add common values *explicitly*, something like:



Im blanking on if xpath has a command / option to return back just a  
string literal (vs. node).


Thanks.

- Jon





POSTing repeated fields to Solr

2008-06-04 Thread Andrew Nagy
Hello - I was wondering if there is a work around with POSTing repeated fields 
to Solr.  I am using Jetty as my container with Solr 1.2.

I tried something like:
http://localhost:8080/solr/select/?q=author:(smith)&rows=0&start=0&facet=true&facet.mincount=1&facet.limit=10&facet.field=authorlast&facet.field=authorfirst

I am only getting back facets from the last facet.field.  With a GET request I 
get back all of the facet fields.  I am assuming this is a limitation with 
Jetty?  How are others doing this?

Thanks
Andrew




Re: Luke / Lucli w/ Solr index (trunk)

2008-06-04 Thread Jon Baer

Thanks ... that worked.  Much appreciated.

- Jon

On Jun 4, 2008, at 4:51 PM, Grant Ingersoll wrote:

You will more than likely need to replace the Lucene jars in Luke  
with the ones used in Solr.  We have upgraded Solr's Lucene  
dependencies in the trunk.


I think, this means getting just the Luke jar (instead of Luke  
"all") and then using it with the Lucene lib here.  Double check the  
Luke website for more info.


HTH,
Grant

On Jun 4, 2008, at 4:27 PM, Alexander Ramos Jardim wrote:

I got this error when trying to use an index generated by an old  
version

with trunk Solr.

2008/6/4 Jon Baer <[EMAIL PROTECTED]>:


Hi,

Just recently upgraded w/ trunk version, Solr works fine but  Luke  
& Lucli

are showing this:

lucli> index /www/solr/test/index
Lucene CLI. Using directory '/www/solr/test/index'. Type 'help' for
instructions.
Error:org.apache.lucene.index.CorruptIndexException: Unknown format
version: -6

Did something change?  Or do I need to wait for those tools to  
update to

new index version?

Indexes were created w/ DataImportHandler if that means anything.

Thanks.

- Jon





--
Alexander Ramos Jardim


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Luke / Lucli w/ Solr index (trunk)

2008-06-04 Thread Grant Ingersoll
You will more than likely need to replace the Lucene jars in Luke with  
the ones used in Solr.  We have upgraded Solr's Lucene dependencies in  
the trunk.


I think, this means getting just the Luke jar (instead of Luke "all")  
and then using it with the Lucene lib here.  Double check the Luke  
website for more info.


HTH,
Grant

On Jun 4, 2008, at 4:27 PM, Alexander Ramos Jardim wrote:

I got this error when trying to use an index generated by an old  
version

with trunk Solr.

2008/6/4 Jon Baer <[EMAIL PROTECTED]>:


Hi,

Just recently upgraded w/ trunk version, Solr works fine but  Luke  
& Lucli

are showing this:

lucli> index /www/solr/test/index
Lucene CLI. Using directory '/www/solr/test/index'. Type 'help' for
instructions.
Error:org.apache.lucene.index.CorruptIndexException: Unknown format
version: -6

Did something change?  Or do I need to wait for those tools to  
update to

new index version?

Indexes were created w/ DataImportHandler if that means anything.

Thanks.

- Jon





--
Alexander Ramos Jardim


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: Solrj + Multicore

2008-06-04 Thread Ryan McKinley


Are you using a recent version of multi-core?

Do you have the /update RequestHandler mapped in solrconfig.xml?   
Since multicore support is new, it does not support the @deprecated / 
update servlet


ryan


On Jun 4, 2008, at 10:07 AM, Alexander Ramos Jardim wrote:

2008/6/3 Ryan McKinley <[EMAIL PROTECTED]>:





This way I don't connect:
new CommonsHttpSolrServer("http://localhost:8983/solr/idxItem";)



this is how you need to connect... otherwise nothing will work.



When I try this way, I get the following exception, when trying to  
make an

update to my index:

org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8983/solr/idxItem/update?wt=xml&version=2.2
   at
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
308)

   at
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
152)

   at
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 
220)

   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:51)
   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:55)
  ...





Perhaps we should throw an exception if you initialize a URL that  
contains

"?"

ryan





--
Alexander Ramos Jardim




Re: Luke / Lucli w/ Solr index (trunk)

2008-06-04 Thread Otis Gospodnetic
Jon,

Lucli is ancient and, as far as I recall, has its own Lucene jar, which you 
could try replacing.  Luke might be better at this point, but even with Luke 
you may have to use your own Lucene jar (the one you used with Solr+DIH).  The 
error indicates index format incompatibility (likely old jar trying to read an 
index created with a newer Lucene).


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Jon Baer <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 4, 2008 4:22:05 PM
> Subject: Luke / Lucli w/ Solr index (trunk)
> 
> Hi,
> 
> Just recently upgraded w/ trunk version, Solr works fine but  Luke &  
> Lucli are showing this:
> 
> lucli> index /www/solr/test/index
> Lucene CLI. Using directory '/www/solr/test/index'. Type 'help' for  
> instructions.
> Error:org.apache.lucene.index.CorruptIndexException: Unknown format  
> version: -6
> 
> Did something change?  Or do I need to wait for those tools to update  
> to new index version?
> 
> Indexes were created w/ DataImportHandler if that means anything.
> 
> Thanks.
> 
> - Jon



Re: Luke / Lucli w/ Solr index (trunk)

2008-06-04 Thread Alexander Ramos Jardim
I got this error when trying to use an index generated by an old version
with trunk Solr.

2008/6/4 Jon Baer <[EMAIL PROTECTED]>:

> Hi,
>
> Just recently upgraded w/ trunk version, Solr works fine but  Luke & Lucli
> are showing this:
>
> lucli> index /www/solr/test/index
> Lucene CLI. Using directory '/www/solr/test/index'. Type 'help' for
> instructions.
> Error:org.apache.lucene.index.CorruptIndexException: Unknown format
> version: -6
>
> Did something change?  Or do I need to wait for those tools to update to
> new index version?
>
> Indexes were created w/ DataImportHandler if that means anything.
>
> Thanks.
>
> - Jon
>
>


-- 
Alexander Ramos Jardim


Luke / Lucli w/ Solr index (trunk)

2008-06-04 Thread Jon Baer

Hi,

Just recently upgraded w/ trunk version, Solr works fine but  Luke &  
Lucli are showing this:


lucli> index /www/solr/test/index
Lucene CLI. Using directory '/www/solr/test/index'. Type 'help' for  
instructions.
Error:org.apache.lucene.index.CorruptIndexException: Unknown format  
version: -6


Did something change?  Or do I need to wait for those tools to update  
to new index version?


Indexes were created w/ DataImportHandler if that means anything.

Thanks.

- Jon



Re: How to describe 2 entities in dataConfig for the DataImporter?

2008-06-04 Thread Shalin Shekhar Mangar
Another thing to note, the parentDeltaQuery is of no use if deltaQuery is
not specified. In my experience, DataImportHandler is pretty fast and delta
queries may not be needed at all if your dataset is small. We use it without
delta queries even with millions of solr documents and it completes very
fast.

On Thu, Jun 5, 2008 at 12:32 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> 1. Yes you can add virtual columns. Any column/name which does not exist in
> schema.xml are used for joins but not added to the document. The "column"
> attribute is the key using which data is read from the entity's Map. The
> "name" attribute is the solr field to which data is written. Also note that
> "name" is optional if both DB and Solr have the same name for the column.
> 2. I omitted it assuming that "id" is not in schema.xml and anyway it is
> not required for joining with anything. Note that if you do want to store
> it, you should create three different Solr fields otherwise values for "id"
> coming from the three entities will be overwrite each other.
> 3. Only one uniqueKey should be there. In this case it would be comboId
> which will be unique for each document.
>
> Hope that helps. Btw, if you find some of these concepts not clearly
> explained in the wiki -- please help us out by adding them there.
>
>
> On Thu, Jun 5, 2008 at 12:15 AM, Julio Castillo <[EMAIL PROTECTED]>
> wrote:
>
>> Thanks Shalin,
>> I'll try this asap. Yes, you did understand the sample schema I've been
>> playing with.
>> Just a couple of questions to clarify for my own understanding your
>> proposal.
>> 1) the column "comboId" doesn't exist on the dB (yet it is specified as a
>> separate "column" for both "owners" and "vets" in your specification). I
>> didn't realize you could add "virtual" columns.
>> 2) the "pets" entity did have a field/column id. The fact that you omitted
>> it, was an oversight, or not necessary, I suppose.
>> 3) Your statement about uniqueKey needs some clarification:
>>   - I do have the following in my schema.xml: id, I
>> also added comboId. Are both necessary?
>>
>> Thanks
>>
>> ** julio
>>
>> -Original Message-
>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, June 04, 2008 11:01 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to describe 2 entities in dataConfig for the
>> DataImporter?
>>
>> Hi Julio,
>>
>> The following are my assumptions after studying your given data-config
>> examples
>>
>> 1. The column id is present in all three tables -- vets, owners and pets.
>> 2. Vets and owners are independent of each other, there is no join
>> required
>> between them 3. There is a parent-child relationship between owners and
>> pets
>> (joined on owner_id column)
>>
>> The whole problem relates to the fact that both Vets and Owners have a
>> primary key with the same name -- "id". If "id" is the uniqueKey for the
>> Solr schema, then some records from owners overwrite vets when the value
>> for
>> the "id" is the same.
>>
>> To solve this problem, we must have a uniqueKey in schema.xml which has
>> different values when coming from Vets and Owners even when value of "id"
>> column is the same in both the tables. At the same time, for the join to
>> work, the original value of "id" should be kept as is.
>>
>> I believe the following data-config.xml should solve your use-case:
>>
>> 
>>>query="select id,first_name,last_name FROM vets"
>>transformer="TemplateTransformer">
>>
>>
>>
>>
>>
>>>query="select id,first_name,last_name FROM owners"
>>transformer="TemplateTransformer">
>>
>>
>>
>>
>>
>>>query="SELECT id,name,birth_date,type_id FROM pets WHERE
>> owner_id='${owners.id}'">
>>
>>
>>
>>
>> 
>>
>> Here, comboId is the uniqueKey for your Solr documents. The "id" field
>> need
>> not exist in your schema.xml, it will only be used for joining the tables
>> and discarded if it does not exist in the schema.xml.
>>
>> Hope this helps.
>>
>> On Wed, Jun 4, 2008 at 10:54 PM, Julio Castillo <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Noble,
>> > Thanks for continuing to assist me on trying to come up a config that
>> > works.
>> > A couple of questions/clarifications:
>> > 1) I had to introduce the "artificial" comboID and the transformer
>> > because of a conflict with a parallel entity on the "id" ("vets" and
>> "owners").
>> > 2) I don't think there is a conflict with the petID because prior to
>> > the introduction of "vets" I had "owners" with no "id" issues regarding
>> "pets".
>> > 3) The conflict was introduced the moment I tried to add "vets".
>> > Unfortunately by introducing the transformer, for "owners", the "pets"
>> > relationships stopped working.
>> >
>> > Below are 3 specifications. The first 2 work in isolation, when
>> > combined(last one) it doesn't work.
>> >
>> > * CASE 1 (Works -nested entitie

Re: How to describe 2 entities in dataConfig for the DataImporter?

2008-06-04 Thread Shalin Shekhar Mangar
1. Yes you can add virtual columns. Any column/name which does not exist in
schema.xml are used for joins but not added to the document. The "column"
attribute is the key using which data is read from the entity's Map. The
"name" attribute is the solr field to which data is written. Also note that
"name" is optional if both DB and Solr have the same name for the column.
2. I omitted it assuming that "id" is not in schema.xml and anyway it is not
required for joining with anything. Note that if you do want to store it,
you should create three different Solr fields otherwise values for "id"
coming from the three entities will be overwrite each other.
3. Only one uniqueKey should be there. In this case it would be comboId
which will be unique for each document.

Hope that helps. Btw, if you find some of these concepts not clearly
explained in the wiki -- please help us out by adding them there.

On Thu, Jun 5, 2008 at 12:15 AM, Julio Castillo <[EMAIL PROTECTED]>
wrote:

> Thanks Shalin,
> I'll try this asap. Yes, you did understand the sample schema I've been
> playing with.
> Just a couple of questions to clarify for my own understanding your
> proposal.
> 1) the column "comboId" doesn't exist on the dB (yet it is specified as a
> separate "column" for both "owners" and "vets" in your specification). I
> didn't realize you could add "virtual" columns.
> 2) the "pets" entity did have a field/column id. The fact that you omitted
> it, was an oversight, or not necessary, I suppose.
> 3) Your statement about uniqueKey needs some clarification:
>   - I do have the following in my schema.xml: id, I
> also added comboId. Are both necessary?
>
> Thanks
>
> ** julio
>
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 04, 2008 11:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?
>
> Hi Julio,
>
> The following are my assumptions after studying your given data-config
> examples
>
> 1. The column id is present in all three tables -- vets, owners and pets.
> 2. Vets and owners are independent of each other, there is no join required
> between them 3. There is a parent-child relationship between owners and
> pets
> (joined on owner_id column)
>
> The whole problem relates to the fact that both Vets and Owners have a
> primary key with the same name -- "id". If "id" is the uniqueKey for the
> Solr schema, then some records from owners overwrite vets when the value
> for
> the "id" is the same.
>
> To solve this problem, we must have a uniqueKey in schema.xml which has
> different values when coming from Vets and Owners even when value of "id"
> column is the same in both the tables. At the same time, for the join to
> work, the original value of "id" should be kept as is.
>
> I believe the following data-config.xml should solve your use-case:
>
> 
>query="select id,first_name,last_name FROM vets"
>transformer="TemplateTransformer">
>
>
>
>
>
>query="select id,first_name,last_name FROM owners"
>transformer="TemplateTransformer">
>
>
>
>
>
>query="SELECT id,name,birth_date,type_id FROM pets WHERE
> owner_id='${owners.id}'">
>
>
>
>
> 
>
> Here, comboId is the uniqueKey for your Solr documents. The "id" field need
> not exist in your schema.xml, it will only be used for joining the tables
> and discarded if it does not exist in the schema.xml.
>
> Hope this helps.
>
> On Wed, Jun 4, 2008 at 10:54 PM, Julio Castillo <[EMAIL PROTECTED]>
> wrote:
>
> > Noble,
> > Thanks for continuing to assist me on trying to come up a config that
> > works.
> > A couple of questions/clarifications:
> > 1) I had to introduce the "artificial" comboID and the transformer
> > because of a conflict with a parallel entity on the "id" ("vets" and
> "owners").
> > 2) I don't think there is a conflict with the petID because prior to
> > the introduction of "vets" I had "owners" with no "id" issues regarding
> "pets".
> > 3) The conflict was introduced the moment I tried to add "vets".
> > Unfortunately by introducing the transformer, for "owners", the "pets"
> > relationships stopped working.
> >
> > Below are 3 specifications. The first 2 work in isolation, when
> > combined(last one) it doesn't work.
> >
> > * CASE 1 (Works -nested entities -no conflicts on ids -no transformer)
> >
> > 
> >   >query="select id,first_name,last_name FROM owners">
> > 
> > 
> >
> >
> > >query="SELECT id,name,birth_date,type_id FROM pets
> > WHERE owner_id='${owners.id}'"
> >parentDeltaQuery="SELECT id FROM owners WHERE
> > id=${pets.owner_id}">
> >
> >
> >
> >
> >  
> > 
> >
> >
> > * CASE 2 (Works -parallel independent entities -introduced transformer
> > to avoid id conflicts)
> >
> > 
> >   >   

RE: How to describe 2 entities in dataConfig for the DataImporter?

2008-06-04 Thread Julio Castillo
Thanks Shalin,
I'll try this asap. Yes, you did understand the sample schema I've been
playing with.
Just a couple of questions to clarify for my own understanding your
proposal.
1) the column "comboId" doesn't exist on the dB (yet it is specified as a
separate "column" for both "owners" and "vets" in your specification). I
didn't realize you could add "virtual" columns.
2) the "pets" entity did have a field/column id. The fact that you omitted
it, was an oversight, or not necessary, I suppose.
3) Your statement about uniqueKey needs some clarification:
   - I do have the following in my schema.xml: id, I
also added comboId. Are both necessary?

Thanks

** julio

-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 04, 2008 11:01 AM
To: solr-user@lucene.apache.org
Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?

Hi Julio,

The following are my assumptions after studying your given data-config
examples

1. The column id is present in all three tables -- vets, owners and pets.
2. Vets and owners are independent of each other, there is no join required
between them 3. There is a parent-child relationship between owners and pets
(joined on owner_id column)

The whole problem relates to the fact that both Vets and Owners have a
primary key with the same name -- "id". If "id" is the uniqueKey for the
Solr schema, then some records from owners overwrite vets when the value for
the "id" is the same.

To solve this problem, we must have a uniqueKey in schema.xml which has
different values when coming from Vets and Owners even when value of "id"
column is the same in both the tables. At the same time, for the join to
work, the original value of "id" should be kept as is.

I believe the following data-config.xml should solve your use-case:





















Here, comboId is the uniqueKey for your Solr documents. The "id" field need
not exist in your schema.xml, it will only be used for joining the tables
and discarded if it does not exist in the schema.xml.

Hope this helps.

On Wed, Jun 4, 2008 at 10:54 PM, Julio Castillo <[EMAIL PROTECTED]>
wrote:

> Noble,
> Thanks for continuing to assist me on trying to come up a config that 
> works.
> A couple of questions/clarifications:
> 1) I had to introduce the "artificial" comboID and the transformer 
> because of a conflict with a parallel entity on the "id" ("vets" and
"owners").
> 2) I don't think there is a conflict with the petID because prior to 
> the introduction of "vets" I had "owners" with no "id" issues regarding
"pets".
> 3) The conflict was introduced the moment I tried to add "vets".
> Unfortunately by introducing the transformer, for "owners", the "pets"
> relationships stopped working.
>
> Below are 3 specifications. The first 2 work in isolation, when 
> combined(last one) it doesn't work.
>
> * CASE 1 (Works -nested entities -no conflicts on ids -no transformer)
>
> 
>  query="select id,first_name,last_name FROM owners">
> 
> 
>
>
>query="SELECT id,name,birth_date,type_id FROM pets 
> WHERE owner_id='${owners.id}'"
>parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>
>
>
>
>  
> 
>
>
> * CASE 2 (Works -parallel independent entities -introduced transformer 
> to avoid id conflicts)
>
> 
>   query="select id,first_name,last_name FROM vets"
>transformer="TemplateTransformer">
>
> 
>
>  
>   query="select id,first_name,last_name FROM owners"
>transformer="TemplateTransformer">
> template="owners-${owners.id}"/>
>
>
>  
> 
>
>
> * CASE 3 (Commented out "vets" to simplify case. Nested entities don't
> work:
> "Document [null] missing required field: id")
>
> 
>  
>   query="select id,first_name,last_name FROM owners"
>transformer="TemplateTransformer">
> template="owners-${owners.id}"/>
>
>
>
>query="SELECT id,name,birth_date,type_id FROM pets 
> WHERE owner_id='${owners.id}'"
>parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>
>
>
>
>  
> 
>
>
> The debug output for one row from the dataImporter while iterating 
> over pets where the first row owner_id=1 (which gets transformed to 
> 'owners-1' -where owner_id is a fk to owners id column) shows as 
> follows:
> "SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'
>
> I believe the issue on somehow having to "untransform" the owners-id 
> prior to comparison with pets foreign key owner_id
>
> Thanks again
>
> ** julio
>
> -Original Message-
> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2008 10:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: 

Re: Solrj + Multicore

2008-06-04 Thread Alexander Ramos Jardim
It is mapped correctly.

2008/6/4 Erik Hatcher <[EMAIL PROTECTED]>:

>
> On Jun 4, 2008, at 10:07 AM, Alexander Ramos Jardim wrote:
>
>> 2008/6/3 Ryan McKinley <[EMAIL PROTECTED]>:
>>
>>
>>>
 This way I don't connect:
 new CommonsHttpSolrServer("http://localhost:8983/solr/idxItem";)


  this is how you need to connect... otherwise nothing will work.
>>>
>>>
>> When I try this way, I get the following exception, when trying to make an
>> update to my index:
>>
>> org.apache.solr.common.SolrException: Not Found
>>
>> Not Found
>>
>> request: http://localhost:8983/solr/idxItem/update?wt=xml&version=2.
>>
>
> Are you using Solr trunk?
>
> Is your solrconfig.xml the same as in trunk?Maybe you don't have this
> mapped?
>
> 
>
>Erik
>
>


-- 
Alexander Ramos Jardim


Re: How to describe 2 entities in dataConfig for the DataImporter?

2008-06-04 Thread Shalin Shekhar Mangar
Hi Julio,

The following are my assumptions after studying your given data-config
examples

1. The column id is present in all three tables -- vets, owners and pets.
2. Vets and owners are independent of each other, there is no join required
between them
3. There is a parent-child relationship between owners and pets (joined on
owner_id column)

The whole problem relates to the fact that both Vets and Owners have a
primary key with the same name -- "id". If "id" is the uniqueKey for the
Solr schema, then some records from owners overwrite vets when the value for
the "id" is the same.

To solve this problem, we must have a uniqueKey in schema.xml which has
different values when coming from Vets and Owners even when value of "id"
column is the same in both the tables. At the same time, for the join to
work, the original value of "id" should be kept as is.

I believe the following data-config.xml should solve your use-case:





















Here, comboId is the uniqueKey for your Solr documents. The "id" field need
not exist in your schema.xml, it will only be used for joining the tables
and discarded if it does not exist in the schema.xml.

Hope this helps.

On Wed, Jun 4, 2008 at 10:54 PM, Julio Castillo <[EMAIL PROTECTED]>
wrote:

> Noble,
> Thanks for continuing to assist me on trying to come up a config that
> works.
> A couple of questions/clarifications:
> 1) I had to introduce the "artificial" comboID and the transformer because
> of a conflict with a parallel entity on the "id" ("vets" and "owners").
> 2) I don't think there is a conflict with the petID because prior to the
> introduction of "vets" I had "owners" with no "id" issues regarding "pets".
> 3) The conflict was introduced the moment I tried to add "vets".
> Unfortunately by introducing the transformer, for "owners", the "pets"
> relationships stopped working.
>
> Below are 3 specifications. The first 2 work in isolation, when
> combined(last one) it doesn't work.
>
> * CASE 1 (Works -nested entities -no conflicts on ids -no transformer)
>
> 
>  query="select id,first_name,last_name FROM owners">
> 
> 
>
>
>query="SELECT id,name,birth_date,type_id FROM pets WHERE
> owner_id='${owners.id}'"
>parentDeltaQuery="SELECT id FROM owners WHERE
> id=${pets.owner_id}">
>
>
>
>
>  
> 
>
>
> * CASE 2 (Works -parallel independent entities -introduced transformer to
> avoid id conflicts)
>
> 
>   query="select id,first_name,last_name FROM vets"
>transformer="TemplateTransformer">
>
> 
>
>  
>   query="select id,first_name,last_name FROM owners"
>transformer="TemplateTransformer">
> template="owners-${owners.id}"/>
>
>
>  
> 
>
>
> * CASE 3 (Commented out "vets" to simplify case. Nested entities don't
> work:
> "Document [null] missing required field: id")
>
> 
>  
>   query="select id,first_name,last_name FROM owners"
>transformer="TemplateTransformer">
> template="owners-${owners.id}"/>
>
>
>
>query="SELECT id,name,birth_date,type_id FROM pets WHERE
> owner_id='${owners.id}'"
>parentDeltaQuery="SELECT id FROM owners WHERE
> id=${pets.owner_id}">
>
>
>
>
>  
> 
>
>
> The debug output for one row from the dataImporter while iterating over
> pets
> where the first row owner_id=1 (which gets transformed to 'owners-1' -where
> owner_id is a fk to owners id column) shows as follows:
> "SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'
>
> I believe the issue on somehow having to "untransform" the owners-id prior
> to comparison with pets foreign key owner_id
>
> Thanks again
>
> ** julio
>
> -Original Message-
> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2008 10:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?
>
> The id in pet should be  aliased to 'petid' , because id is coming from
> both
> entities there is a conflict   query="select id,first_name,last_name FROM owners"
>  transformer="TemplateTransformer">
>  
>  
>  
>  
>
>query="SELECT id,name,birth_date,type_id FROM pets WHERE
> owner_id='${owners.id}'"
>  parentDeltaQuery="SELECT id FROM owners WHERE
> id=${pets.owner_id}">
>  
>  
>  
>  
>  
>
>
> On Wed, Jun 4, 2008 at 10:37 AM, Noble Paul ??? ??
> <[EMAIL PROTECTED]> wrote:
> > hi julio,
> > You must create an extra field for 'comboid' because you really need
> > the 'id' for your sub-entities. Your data-config must look as follows.
> > The pet also has a field called 'id' . It is not a good idea. call it
> > 'petid' or something (both

Boost support for MoreLikeThis fields

2008-06-04 Thread Tom Morton
Hi,
   SOLR-295  mentions boost
support for morelikethis and then seems to have been subsumed by
SOLR-281.
To be clear, I'm talking about boosts for the mlt.fl fields and how they are
ranked rather than for the seeding query.  Has this feature gotten any
attention?

Thanks...Tom


Re: Solrj + Multicore

2008-06-04 Thread Erik Hatcher


On Jun 4, 2008, at 10:07 AM, Alexander Ramos Jardim wrote:

2008/6/3 Ryan McKinley <[EMAIL PROTECTED]>:





This way I don't connect:
new CommonsHttpSolrServer("http://localhost:8983/solr/idxItem";)



this is how you need to connect... otherwise nothing will work.



When I try this way, I get the following exception, when trying to  
make an

update to my index:

org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8983/solr/idxItem/update?wt=xml&version=2.


Are you using Solr trunk?

Is your solrconfig.xml the same as in trunk?Maybe you don't have  
this mapped?




Erik



Re: 1.3 DisMax and MoreLikeThis

2008-06-04 Thread Tom Morton
Hi,
  Thanks Yonik.  That fixed that.  I would be useful to change one of the
existing dismax query types in  the default solrconfig.xml to use this new
syntax (Especially since DisMaxRequestHandler is being deprecared.)

Thanks again...Tom

On Wed, Jun 4, 2008 at 11:19 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Wed, Jun 4, 2008 at 11:11 AM, Tom Morton <[EMAIL PROTECTED]> wrote:
> >   I wanted to use the new dismax support for more like this described in
> > SOLR-295  but can't even
> get
> > the new syntax for dismax to work (described in
> > SOLR-281).
> > Any ideas if this functionality works?
> >
> > Here's the relevant part of my solr config,
> >
> >   > defType="dismax">
>
> defType is just another parameter and should appear in the defaults
> section below.
> -Yonik
>
> >
> > explicit
> > 0.01
> > 
> >relatedExact^2 genre^0.5
> > 
> > 100
> > *:*
> >
> >  
> >
> > Example query:
> >
> http://localhost:13280/solr/genre?indent=on&version=2.2&q=terrence+howard&start=0&rows=10&fl=*%2Cscore&wt=standard&debugQuery=on&explainOther=&hl.fl=
> >
> > Debug output: (I would expect to see dismax scoring)
> >
> > 
> > 11.151003 = (MATCH) sum of:
> >  6.925395 = (MATCH) weight(name:terrence in 63941), product of:
> >0.7880709 = queryWeight(name:terrence), product of:
> >  10.0431795 = idf(docFreq=234, numDocs=1988249)
> >  0.07846827 = queryNorm
> >8.787782 = (MATCH) fieldWeight(name:terrence in 63941), product of:
> >  1.0 = tf(termFreq(name:terrence)=1)
> >  10.0431795 = idf(docFreq=234, numDocs=1988249)
> >  0.875 = fieldNorm(field=name, doc=63941)
> >  4.2256074 = (MATCH) weight(name:howard in 63941), product of:
> >0.6155844 = queryWeight(name:howard), product of:
> >  7.84501 = idf(docFreq=2116, numDocs=1988249)
> >  0.07846827 = queryNorm
> >6.8643837 = (MATCH) fieldWeight(name:howard in 63941), product of:
> >  1.0 = tf(termFreq(name:howard)=1)
> >  7.84501 = idf(docFreq=2116, numDocs=1988249)
> >  0.875 = fieldNorm(field=name, doc=63941)
> >
> >
> > Here's my build info:
> > Solr Specification Version: 1.2.2008.06.02.15.21.48
> > Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
> > 15:21:48
> >
> > Is this feature now broken or does it look like my config is wrong?
> >
> > Thanks...Tom
> >
>


RE: How to describe 2 entities in dataConfig for the DataImporter?

2008-06-04 Thread Julio Castillo
Noble,
Thanks for continuing to assist me on trying to come up a config that works.
A couple of questions/clarifications:
1) I had to introduce the "artificial" comboID and the transformer because
of a conflict with a parallel entity on the "id" ("vets" and "owners").
2) I don't think there is a conflict with the petID because prior to the
introduction of "vets" I had "owners" with no "id" issues regarding "pets".
3) The conflict was introduced the moment I tried to add "vets".
Unfortunately by introducing the transformer, for "owners", the "pets"
relationships stopped working.

Below are 3 specifications. The first 2 work in isolation, when
combined(last one) it doesn't work.

* CASE 1 (Works -nested entities -no conflicts on ids -no transformer)


  









  



* CASE 2 (Works -parallel independent entities -introduced transformer to
avoid id conflicts)


  



  
  



  



* CASE 3 (Commented out "vets" to simplify case. Nested entities don't work:
"Document [null] missing required field: id")


  
  









  



The debug output for one row from the dataImporter while iterating over pets
where the first row owner_id=1 (which gets transformed to 'owners-1' -where
owner_id is a fk to owners id column) shows as follows:
"SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'

I believe the issue on somehow having to "untransform" the owners-id prior
to comparison with pets foreign key owner_id

Thanks again

** julio

-Original Message-
From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 10:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?

The id in pet should be  aliased to 'petid' , because id is coming from both
entities there is a conflict 
  
  
  
  

  
  
  
  
  
  


On Wed, Jun 4, 2008 at 10:37 AM, Noble Paul ??? ??
<[EMAIL PROTECTED]> wrote:
> hi julio,
> You must create an extra field for 'comboid' because you really need 
> the 'id' for your sub-entities. Your data-config must look as follows.
> The pet also has a field called 'id' . It is not a good idea. call it 
> 'petid' or something (both in dataconfig and schema.xml). Please make 
> sure that the field names are unique .
>
>
>query="select id,first_name,last_name FROM owners"
>   transformer="TemplateTransformer">
>   
>   
>   
>   
>
>  query="SELECT id,name,birth_date,type_id FROM pets WHERE 
> owner_id='${owners.id}'"
>   parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>   
>   
>   
>   
>   
>
>
> On Wed, Jun 4, 2008 at 5:50 AM, Julio Castillo <[EMAIL PROTECTED]>
wrote:
>> Hi Noble,
>> I had forgotten to also list comboId as a uniqueKey in the schema.xml
file.
>> But that didn't make a difference.
>> It still complained about the "Document [null] missing required field:
id"
>> for each row it ran into of the outer entity.
>>
>> If you look at the debug output of the entity:pets (see below on 
>> original message).
>> The query looks like this:
>> "SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'
>>
>> This is the problem lies, because, the owner_id in the pets table is 
>> currently a number and thus will not match the modified combo id 
>> generated for the owners' id column.
>>
>> So, somehow, I need to be able to either remove the 'owners-' suffix 
>> before comparing, or append the same suffix to the pets.owner_id 
>> value prior to comparing.
>>
>> Thanks
>>
>> ** julio
>>
>> -Original Message-
>> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
>> Sent: Monday, June 02, 2008 9:20 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to describe 2 entities in dataConfig for the
DataImporter?
>>
>> hi Julio,
>> delete my previous response. In your schema , 'id' is the uniqueKey.
>> make  'comboid' the unique key. Because that is the target field name 
>> coming out of the entity 'owners'
>>
>> --Noble
>>
>> On Tue, Jun 3, 2008 at 9:46 AM, Noble Paul ??? ??
>> <[EMAIL PROTECTED]> wrote:
>>> The field 'id' is repeated for pet also rename it to something else 
>>> say  >>   query="SELECT id,name,birth_date,type_id FROM pets 
>>> WHERE owner_id='${owners.id}'"
>>>   parentDeltaQuery="SELECT id FROM owners WHERE 
>>> id=${pets.owner_id}">
>>>   
>>> 
>>>
>>> --Noble
>>>
>>> On Tue, Jun 3, 2008 at 3:28 AM, Julio Castillo 
>>> <[EMAIL PROTECTED]>
>> wrote:
 Shalin,
 I experimented with it, and the null pointer exception has been 
 taken care of. Thank you.

 I have a different problem now. I believe it is a 
 syntax/specification problem.

 When importing data, I got the following exceptions:
 SEVERE: Exc

Re: Ideas on how to implement "sponsored results"

2008-06-04 Thread Alexander Ramos Jardim
Cuong,

I think you will need some manipulation beyond solr queries. You should
separate the results by your site criteria after retrieving them. After
that, you could cache the results on your application and randomize the
lists every time you render the a page.

I don't know if solr has collapsing capabilities but it has any beyond
faceting, it would be a great boost to your work.

2008/6/3 climbingrose <[EMAIL PROTECTED]>:

> Hi Alexander,
>
> Thanks for your suggestion. I think my problem is a bit different from
> yours. We don't have any sponsored words but we have to retrieve sponsored
> results directly from the index. This is because a site can have 60,000
> products which is hard to insert/update keywords. I can live with that by
> issuing a separate query to fetch sponsored results. My problem is to
> equally distribute sponsored results between sites so that each site will
> have an opportunity to show their sponsored results no matter how many
> products they have. For example, if site A has 6 products, site B has
> only 2000 then sponsored products from site B will have a very small chance
> to be displayed.
>
>
> On Wed, Jun 4, 2008 at 2:56 AM, Alexander Ramos Jardim <
> [EMAIL PROTECTED]> wrote:
>
> > Cuong,
> >
> > I have implemented sponsored words for a client. I don't know if my
> working
> > can help you but I will expose it and let you decide.
> >
> > I have an index containing products entries that I created a field called
> > sponsored words. What I do is to boost this field , so when these words
> are
> > matched in the query that products appear first on my result.
> >
> > 2008/6/3 climbingrose <[EMAIL PROTECTED]>:
> >
> > > Hi all,
> > >
> > > I'm trying to implement "sponsored results" in Solr search results
> > similar
> > > to that of Google. We index products from various sites and would like
> to
> > > allow certain sites to promote their products. My approach is to query
> a
> > > slave instance to get sponsored results for user queries in addition to
> > the
> > > normal search results. This part is easy. However, since the number of
> > > products indexed for each sites can be very different (100, 1000, 1
> > or
> > > 6 products), we need a way to fairly distribute the sponsored
> results
> > > among sites.
> > >
> > > My initial thought is utilising field collapsing patch to collapse the
> > > search results on siteId field. You can imagine that this will create a
> > > series of "buckets of results", each bucket representing results from a
> > > site. After that, 2 or 3 buckets will randomly be selected from which I
> > > will
> > > randomly select one or two results from. However, since I want these
> > > sponsored results to be relevant to user queries, I'd like only want to
> > > have
> > > the first 30 results in each buckets.
> > >
> > > Obviously, it's desirable that if the user refreshes the page, new
> > > sponsored
> > > results will be displayed. On the other hand, I also want to have the
> > > advantages of Solr cache.
> > >
> > > What would be the best way to implement this functionality? Thanks.
> > >
> > > Cheers,
> > > Cuong
> > >
> >
> >
> >
> > --
> > Alexander Ramos Jardim
> >
>
>
>
> --
> Regards,
>
> Cuong Hoang
>



-- 
Alexander Ramos Jardim


Re: an error after deleting "index" files

2008-06-04 Thread Jón Helgi Jónsson
Hi,

not sure but this might be same thing that happened to me, you have to
delete the index FOLDER not only the contents.

On Wed, Jun 4, 2008 at 3:27 PM, Nahuel ANGELINETTI <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm doing some test with solr to see if it can be usefull for us, and after
> deleting the index to restart from scratch the indexing, it returns me this
> error when I want to post datas :
>
>
> java.lang.RuntimeException: java.io.FileNotFoundException: no segments*
> file found in org.apache.lucene.store.FSDirectory@/opt/solr/data/index:
> files:
>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
>at org.apache.solr.core.SolrCore.(SolrCore.java:216)
>at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177)
>at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
>at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:223)
>at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:304)
>at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:77)
>at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3634)
>at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4217)
>at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:759)
>at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
>at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:524)
>at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:608)
>at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:535)
>at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:470)
>at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1122)
>at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
>at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1021)
>at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
>at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1013)
>at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:442)
>at
> org.apache.catalina.core.StandardService.start(StandardService.java:450)
>at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:709)
>at org.apache.catalina.startup.Catalina.start(Catalina.java:551)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:585)
>at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
>at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:432)
> Caused by: java.io.FileNotFoundException: no segments* file found in
> org.apache.lucene.store.FSDirectory@/opt/solr/data/index: files:
>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
>at org.apache.lucene.index.IndexReader.open(IndexReader.java:184)
>at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
>at
> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:87)
>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
>... 30 more
> ) qui l'a empêché de satisfaire la requête. noshade="noshade">Apache Tomcat/5.5
>
>
>
> Can you say me how to regenerate an empty index ?
>
> Thank you.
>
>
>
>
> --
>  ANGELINETTI Nahuel
>


an error after deleting "index" files

2008-06-04 Thread Nahuel ANGELINETTI

Hi,

I'm doing some test with solr to see if it can be usefull for us, and 
after deleting the index to restart from scratch the indexing, it 
returns me this error when I want to post datas :



java.lang.RuntimeException: java.io.FileNotFoundException: no segments* 
file found in org.apache.lucene.store.FSDirectory@/opt/solr/data/index: 
files:

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
at org.apache.solr.core.SolrCore.(SolrCore.java:216)
at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177)
	at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
	at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:223)
	at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:304)
	at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:77)
	at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3634)
	at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4217)
	at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:759)

at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:524)
	at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:608)
	at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:535)

at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:470)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1122)
	at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
	at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1021)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1013)
at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:442)
at 
org.apache.catalina.core.StandardService.start(StandardService.java:450)
at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:709)
at org.apache.catalina.startup.Catalina.start(Catalina.java:551)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:432)
Caused by: java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.FSDirectory@/opt/solr/data/index: files:
	at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)

at org.apache.lucene.index.IndexReader.open(IndexReader.java:184)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
	at 
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:87)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
... 30 more
) qui l'a empêché de satisfaire la requête.noshade="noshade">Apache Tomcat/5.5




Can you say me how to regenerate an empty index ?

Thank you.




--
  ANGELINETTI Nahuel


Re: 1.3 DisMax and MoreLikeThis

2008-06-04 Thread Yonik Seeley
On Wed, Jun 4, 2008 at 11:11 AM, Tom Morton <[EMAIL PROTECTED]> wrote:
>   I wanted to use the new dismax support for more like this described in
> SOLR-295  but can't even get
> the new syntax for dismax to work (described in
> SOLR-281).
> Any ideas if this functionality works?
>
> Here's the relevant part of my solr config,
>
>   defType="dismax">

defType is just another parameter and should appear in the defaults
section below.
-Yonik

>
> explicit
> 0.01
> 
>relatedExact^2 genre^0.5
> 
> 100
> *:*
>
>  
>
> Example query:
> http://localhost:13280/solr/genre?indent=on&version=2.2&q=terrence+howard&start=0&rows=10&fl=*%2Cscore&wt=standard&debugQuery=on&explainOther=&hl.fl=
>
> Debug output: (I would expect to see dismax scoring)
>
> 
> 11.151003 = (MATCH) sum of:
>  6.925395 = (MATCH) weight(name:terrence in 63941), product of:
>0.7880709 = queryWeight(name:terrence), product of:
>  10.0431795 = idf(docFreq=234, numDocs=1988249)
>  0.07846827 = queryNorm
>8.787782 = (MATCH) fieldWeight(name:terrence in 63941), product of:
>  1.0 = tf(termFreq(name:terrence)=1)
>  10.0431795 = idf(docFreq=234, numDocs=1988249)
>  0.875 = fieldNorm(field=name, doc=63941)
>  4.2256074 = (MATCH) weight(name:howard in 63941), product of:
>0.6155844 = queryWeight(name:howard), product of:
>  7.84501 = idf(docFreq=2116, numDocs=1988249)
>  0.07846827 = queryNorm
>6.8643837 = (MATCH) fieldWeight(name:howard in 63941), product of:
>  1.0 = tf(termFreq(name:howard)=1)
>  7.84501 = idf(docFreq=2116, numDocs=1988249)
>  0.875 = fieldNorm(field=name, doc=63941)
>
>
> Here's my build info:
> Solr Specification Version: 1.2.2008.06.02.15.21.48
> Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
> 15:21:48
>
> Is this feature now broken or does it look like my config is wrong?
>
> Thanks...Tom
>


1.3 DisMax and MoreLikeThis

2008-06-04 Thread Tom Morton
Hi,
   I wanted to use the new dismax support for more like this described in
SOLR-295  but can't even get
the new syntax for dismax to work (described in
SOLR-281).
Any ideas if this functionality works?

Here's the relevant part of my solr config,

  

 explicit
 0.01
 
relatedExact^2 genre^0.5
 
 100
 *:*

  

Example query:
http://localhost:13280/solr/genre?indent=on&version=2.2&q=terrence+howard&start=0&rows=10&fl=*%2Cscore&wt=standard&debugQuery=on&explainOther=&hl.fl=

Debug output: (I would expect to see dismax scoring)


11.151003 = (MATCH) sum of:
  6.925395 = (MATCH) weight(name:terrence in 63941), product of:
0.7880709 = queryWeight(name:terrence), product of:
  10.0431795 = idf(docFreq=234, numDocs=1988249)
  0.07846827 = queryNorm
8.787782 = (MATCH) fieldWeight(name:terrence in 63941), product of:
  1.0 = tf(termFreq(name:terrence)=1)
  10.0431795 = idf(docFreq=234, numDocs=1988249)
  0.875 = fieldNorm(field=name, doc=63941)
  4.2256074 = (MATCH) weight(name:howard in 63941), product of:
0.6155844 = queryWeight(name:howard), product of:
  7.84501 = idf(docFreq=2116, numDocs=1988249)
  0.07846827 = queryNorm
6.8643837 = (MATCH) fieldWeight(name:howard in 63941), product of:
  1.0 = tf(termFreq(name:howard)=1)
  7.84501 = idf(docFreq=2116, numDocs=1988249)
  0.875 = fieldNorm(field=name, doc=63941)


Here's my build info:
Solr Specification Version: 1.2.2008.06.02.15.21.48
Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
15:21:48

Is this feature now broken or does it look like my config is wrong?

Thanks...Tom


Re: Solrj + Multicore

2008-06-04 Thread Alexander Ramos Jardim
2008/6/3 Ryan McKinley <[EMAIL PROTECTED]>:

>
>>
>> This way I don't connect:
>> new CommonsHttpSolrServer("http://localhost:8983/solr/idxItem";)
>>
>>
> this is how you need to connect... otherwise nothing will work.
>

When I try this way, I get the following exception, when trying to make an
update to my index:

org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8983/solr/idxItem/update?wt=xml&version=2.2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:308)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:152)
at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:220)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:51)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:55)
   ...



>
> Perhaps we should throw an exception if you initialize a URL that contains
> "?"
>
> ryan
>
>


-- 
Alexander Ramos Jardim


Re: Index structuring

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
 Fot 16 mil docs it may not be necessary. Add the shards when you see
that perf is degrading.
--Noble

On Wed, Jun 4, 2008 at 4:17 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> The number of docs I have indexed till now is : 1,633,570
> I am bit afraid as the number of indexed docs will grow atleast 5-10 times
> in very near future.
>
> Regards,
> Ritesh Ambastha
>
>
>
> Shalin Shekhar Mangar wrote:
>>
>> A lot of this also depends on the number of documents. But we have
>> successfully used Solr with upto 10-12 million documents.
>>
>> On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
>> wrote:
>>
>>>
>>> Thanks Noble,
>>>
>>> That means, I can go ahead with single Index for long.
>>> :)
>>>
>>> Regards,
>>> Ritesh Ambastha
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>> >
>>> > For the datasize you are proposing , single index should be fine .Just
>>> > give the m/c enough RAM
>>> >
>>> > Distributed search involves multiple requests made between shards
>>> > which may be an unncessary overhead.
>>> > --Noble
>>> >
>>> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha
>>> <[EMAIL PROTECTED]>
>>> > wrote:
>>> >>
>>> >> Thanks Noble,
>>> >>
>>> >> I maintain two separate indexes on my disk for two different search
>>> >> services.
>>> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
>>> >> index
>>> >> size will grow in future, and may reach 10GB.
>>> >>
>>> >> My doubts :
>>> >>
>>> >> 1. When should I start partitioning my index?
>>> >> 2. Is there any performance issue with partitioning? For eg: A query
>>> on
>>> >> 1GB
>>> >> and 500MB indexed data will take same time to give the result? Or
>>> lesser
>>> >> the
>>> >> index size, lesser the response time?
>>> >>
>>> >>
>>> >> Regards,
>>> >> Ritesh Ambastha
>>> >>
>>> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>> >>>
>>> >>> You could have been more specific on the dataset size.
>>> >>>
>>> >>> If your data volumes are growing you can partition your index into
>>> >>> multiple shards.
>>> >>> http://wiki.apache.org/solr/DistributedSearch
>>> >>> --Noble
>>> >>>
>>> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>>> >>> <[EMAIL PROTECTED]>
>>> >>> wrote:
>>> 
>>>  Dear Readers,
>>> 
>>>  I am a newbie in solr world. I have successfully deployed solr on my
>>>  machine, and I am able to index a large DB table. I am pretty sure
>>> that
>>>  internal index structure of solr is much capable to handle large
>>> data
>>>  sets.
>>> 
>>>  But, say my data size keeps growing at jet speed, then what should
>>> be
>>>  the
>>>  index structure? Do I need to follow some specific index structuring
>>>  patterns/algos for handling such massive data?
>>> 
>>>  I am sorry as I may be sounding novice in this area. I would
>>> appreciate
>>>  your
>>>  thoughts/suggestions.
>>> 
>>>  Regards,
>>>  Ritesh Ambastha
>>>  --
>>>  View this message in context:
>>>  http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>>  Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> 
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> --Noble Paul
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > --Noble Paul
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17643909.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Index structuring

2008-06-04 Thread Ritesh Ambastha

The number of docs I have indexed till now is : 1,633,570
I am bit afraid as the number of indexed docs will grow atleast 5-10 times
in very near future. 

Regards,
Ritesh Ambastha 



Shalin Shekhar Mangar wrote:
> 
> A lot of this also depends on the number of documents. But we have
> successfully used Solr with upto 10-12 million documents.
> 
> On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
> 
>>
>> Thanks Noble,
>>
>> That means, I can go ahead with single Index for long.
>> :)
>>
>> Regards,
>> Ritesh Ambastha
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> >
>> > For the datasize you are proposing , single index should be fine .Just
>> > give the m/c enough RAM
>> >
>> > Distributed search involves multiple requests made between shards
>> > which may be an unncessary overhead.
>> > --Noble
>> >
>> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha
>> <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> Thanks Noble,
>> >>
>> >> I maintain two separate indexes on my disk for two different search
>> >> services.
>> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
>> >> index
>> >> size will grow in future, and may reach 10GB.
>> >>
>> >> My doubts :
>> >>
>> >> 1. When should I start partitioning my index?
>> >> 2. Is there any performance issue with partitioning? For eg: A query
>> on
>> >> 1GB
>> >> and 500MB indexed data will take same time to give the result? Or
>> lesser
>> >> the
>> >> index size, lesser the response time?
>> >>
>> >>
>> >> Regards,
>> >> Ritesh Ambastha
>> >>
>> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> >>>
>> >>> You could have been more specific on the dataset size.
>> >>>
>> >>> If your data volumes are growing you can partition your index into
>> >>> multiple shards.
>> >>> http://wiki.apache.org/solr/DistributedSearch
>> >>> --Noble
>> >>>
>> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>> >>> <[EMAIL PROTECTED]>
>> >>> wrote:
>> 
>>  Dear Readers,
>> 
>>  I am a newbie in solr world. I have successfully deployed solr on my
>>  machine, and I am able to index a large DB table. I am pretty sure
>> that
>>  internal index structure of solr is much capable to handle large
>> data
>>  sets.
>> 
>>  But, say my data size keeps growing at jet speed, then what should
>> be
>>  the
>>  index structure? Do I need to follow some specific index structuring
>>  patterns/algos for handling such massive data?
>> 
>>  I am sorry as I may be sounding novice in this area. I would
>> appreciate
>>  your
>>  thoughts/suggestions.
>> 
>>  Regards,
>>  Ritesh Ambastha
>>  --
>>  View this message in context:
>>  http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>  Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> --Noble Paul
>> >>>
>> >>>
>> >>
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > --Noble Paul
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643909.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index structuring

2008-06-04 Thread Shalin Shekhar Mangar
A lot of this also depends on the number of documents. But we have
successfully used Solr with upto 10-12 million documents.

On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
wrote:

>
> Thanks Noble,
>
> That means, I can go ahead with single Index for long.
> :)
>
> Regards,
> Ritesh Ambastha
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >
> > For the datasize you are proposing , single index should be fine .Just
> > give the m/c enough RAM
> >
> > Distributed search involves multiple requests made between shards
> > which may be an unncessary overhead.
> > --Noble
> >
> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> Thanks Noble,
> >>
> >> I maintain two separate indexes on my disk for two different search
> >> services.
> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
> >> index
> >> size will grow in future, and may reach 10GB.
> >>
> >> My doubts :
> >>
> >> 1. When should I start partitioning my index?
> >> 2. Is there any performance issue with partitioning? For eg: A query on
> >> 1GB
> >> and 500MB indexed data will take same time to give the result? Or lesser
> >> the
> >> index size, lesser the response time?
> >>
> >>
> >> Regards,
> >> Ritesh Ambastha
> >>
> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >>>
> >>> You could have been more specific on the dataset size.
> >>>
> >>> If your data volumes are growing you can partition your index into
> >>> multiple shards.
> >>> http://wiki.apache.org/solr/DistributedSearch
> >>> --Noble
> >>>
> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
> >>> <[EMAIL PROTECTED]>
> >>> wrote:
> 
>  Dear Readers,
> 
>  I am a newbie in solr world. I have successfully deployed solr on my
>  machine, and I am able to index a large DB table. I am pretty sure
> that
>  internal index structure of solr is much capable to handle large data
>  sets.
> 
>  But, say my data size keeps growing at jet speed, then what should be
>  the
>  index structure? Do I need to follow some specific index structuring
>  patterns/algos for handling such massive data?
> 
>  I am sorry as I may be sounding novice in this area. I would
> appreciate
>  your
>  thoughts/suggestions.
> 
>  Regards,
>  Ritesh Ambastha
>  --
>  View this message in context:
>  http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> >>>
> >>>
> >>>
> >>> --
> >>> --Noble Paul
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Index structuring

2008-06-04 Thread Ritesh Ambastha

Thanks Noble, 

That means, I can go ahead with single Index for long. 
:)

Regards,
Ritesh Ambastha

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> For the datasize you are proposing , single index should be fine .Just
> give the m/c enough RAM
> 
> Distributed search involves multiple requests made between shards
> which may be an unncessary overhead.
> --Noble
> 
> On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
>>
>> Thanks Noble,
>>
>> I maintain two separate indexes on my disk for two different search
>> services.
>> The index size of two are: 91MB and 615MB. I am pretty sure that these
>> index
>> size will grow in future, and may reach 10GB.
>>
>> My doubts :
>>
>> 1. When should I start partitioning my index?
>> 2. Is there any performance issue with partitioning? For eg: A query on
>> 1GB
>> and 500MB indexed data will take same time to give the result? Or lesser
>> the
>> index size, lesser the response time?
>>
>>
>> Regards,
>> Ritesh Ambastha
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> You could have been more specific on the dataset size.
>>>
>>> If your data volumes are growing you can partition your index into
>>> multiple shards.
>>> http://wiki.apache.org/solr/DistributedSearch
>>> --Noble
>>>
>>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>>> <[EMAIL PROTECTED]>
>>> wrote:

 Dear Readers,

 I am a newbie in solr world. I have successfully deployed solr on my
 machine, and I am able to index a large DB table. I am pretty sure that
 internal index structure of solr is much capable to handle large data
 sets.

 But, say my data size keeps growing at jet speed, then what should be
 the
 index structure? Do I need to follow some specific index structuring
 patterns/algos for handling such massive data?

 I am sorry as I may be sounding novice in this area. I would appreciate
 your
 thoughts/suggestions.

 Regards,
 Ritesh Ambastha
 --
 View this message in context:
 http://www.nabble.com/Index-structuring-tp17576449p17576449.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643798.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index structuring

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
For the datasize you are proposing , single index should be fine .Just
give the m/c enough RAM

Distributed search involves multiple requests made between shards
which may be an unncessary overhead.
--Noble

On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> Thanks Noble,
>
> I maintain two separate indexes on my disk for two different search
> services.
> The index size of two are: 91MB and 615MB. I am pretty sure that these index
> size will grow in future, and may reach 10GB.
>
> My doubts :
>
> 1. When should I start partitioning my index?
> 2. Is there any performance issue with partitioning? For eg: A query on 1GB
> and 500MB indexed data will take same time to give the result? Or lesser the
> index size, lesser the response time?
>
>
> Regards,
> Ritesh Ambastha
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> You could have been more specific on the dataset size.
>>
>> If your data volumes are growing you can partition your index into
>> multiple shards.
>> http://wiki.apache.org/solr/DistributedSearch
>> --Noble
>>
>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Dear Readers,
>>>
>>> I am a newbie in solr world. I have successfully deployed solr on my
>>> machine, and I am able to index a large DB table. I am pretty sure that
>>> internal index structure of solr is much capable to handle large data
>>> sets.
>>>
>>> But, say my data size keeps growing at jet speed, then what should be the
>>> index structure? Do I need to follow some specific index structuring
>>> patterns/algos for handling such massive data?
>>>
>>> I am sorry as I may be sounding novice in this area. I would appreciate
>>> your
>>> thoughts/suggestions.
>>>
>>> Regards,
>>> Ritesh Ambastha
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Index structuring

2008-06-04 Thread Ritesh Ambastha

Thanks Noble, 

I maintain two separate indexes on my disk for two different search
services.
The index size of two are: 91MB and 615MB. I am pretty sure that these index
size will grow in future, and may reach 10GB. 

My doubts :

1. When should I start partitioning my index?
2. Is there any performance issue with partitioning? For eg: A query on 1GB
and 500MB indexed data will take same time to give the result? Or lesser the
index size, lesser the response time? 


Regards,
Ritesh Ambastha

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> You could have been more specific on the dataset size.
> 
> If your data volumes are growing you can partition your index into
> multiple shards.
> http://wiki.apache.org/solr/DistributedSearch
> --Noble
> 
> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
>>
>> Dear Readers,
>>
>> I am a newbie in solr world. I have successfully deployed solr on my
>> machine, and I am able to index a large DB table. I am pretty sure that
>> internal index structure of solr is much capable to handle large data
>> sets.
>>
>> But, say my data size keeps growing at jet speed, then what should be the
>> index structure? Do I need to follow some specific index structuring
>> patterns/algos for handling such massive data?
>>
>> I am sorry as I may be sounding novice in this area. I would appreciate
>> your
>> thoughts/suggestions.
>>
>> Regards,
>> Ritesh Ambastha
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643690.html
Sent from the Solr - User mailing list archive at Nabble.com.