subject:"\"Re\\\: How to make Relationships work for Multi\\\-valued Index Fields\\\?\""

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-26 Thread Alexander Ramos Jardim

Hey Gunaranjan,

I have the same scenario as you.

A lucene index is denormalized. It should not contain entity relationship.
When I need to do something like you are doing, I group the related values
in one field.

Let's say we have 2 credit cards. the first has id 30459673 and taxes at
1.5%/month and the second has id 56305 and taxes at 2.5%. What I do is
create a multivalued field that I index the values as "id ^ taxes". In the
client side I put the logic to parse the string in a convenient way to work
with the values. I expect that helps you.

2009/1/25 Gunaranjan Chandraraju 

> Paul
> Its not just about merging the fields or resource usage.  If you look at
> the scenario below, the issue is that it mixes up my fields (shipping and
> billing address) for instance.  I can't merge them and still keep the
> 'distinction' for search.Your case is a 'generalization' field.  Thus
> the search will work.   I know mine is a trivial example and can be overcome
> by just two fields (shipping_address & billing_address  - but can I am
> talking of cases when we have many such 'groups of fields').
>
> In general such one to many relationship for indices in a 'document' is
> also really really common :).  Again I am not trying to argue a point - I
> would be happy to get some idea on how to do it and be corrected if I'm
> wrong.
>
> Lastly (while thats not my worry point right now), I tend to be careful
> with resources. When dealing with very large data, I will avoid any
> unnecessary overhead as-far-as-possible and take every optimization I get :)
>
> Guna
>
>
> On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:
>
>  Guna,
>>
>> it's really really normal to duplicate stuffs to be merged into a field.
>>
>> We do this all the time, for example to have a field
>> "text-in-any-language" while a field "text-in-english" is also there and the
>> queries boost matches in text-in-any-language less than text-in-english (if
>> user is in english).
>>
>> This difference in weighting is the gold of Lucene I feel (of retrieval
>> generally).
>> Also, depending on the field you make different indexing, while still
>> copying it in solr (for example use a different analyzer per language).
>>
>> paul
>>
>> PS: don't be scared with resources, this is the side of the world where
>> the resource is the least the problem! (typically a "catch-all-field"
>> wouldn't be stored though as this would then load the memory).
>>
>>
>> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
>>
>>  Thanks
>>> This sounds redundant to me - to store the fields separately and then
>>> concat all of them to one copy field again.
>>>
>>> My XML is like this
>>> 
>>>
>>> I am currently using XPATH or XSL to separate them into individual
>>> indexed fields like: address_state_1, address_type_1 etc. in SOLR.
>>>
>>> From what you say, it looks to me that I might as well just treat the
>>> entire address as a single 'text field' and search within the text after
>>> tokenizing.  This way I don't need to have the _1, _2 as the single text
>>> field will contain the information together (and thus grouped - so I know
>>> which is shipping/billing etc?).Will there be any performance difference
>>> between this and the copy field approach?
>>>
>>> Is there no other way (programmatic) to search across multiple fields?  I
>>> did take a quick look at dismax but again it needs the field names to be
>>> specifically mentioned in the config file or in the query.  I can't do this
>>> as I am not able to predict the number of fields (e.g. credit cards a person
>>> can have?).
>>>
>>> I like SOLR, but to me, this seems to be a very common and simple search
>>> scenario/pattern - however its implementation in SOLR is appearing to be not
>>> very straightforward.   (My apologies, if I on the wrong track here because
>>> I don't understand SOLR well.  )
>>>
>>> Regards,
>>> Guna
>>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>>  for searching you need to put them in a single field . use 
 in schema.xml to achieve that

 On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
  wrote:

> I make this approach work with XPATH and XSL.   However, this approach
> creates multiple fields of like this
>
> address_state_1
> address_state_2
> ...
> address_state_10
>
> and
>
> credit_card_1
> credit_card_2
> credit_card_3
>
>
> How do I search for a credit_card.The query syntax does not seem to
> support wild cards in field names.   For e.g. I cant seem to do this ->
> credit_card*:1234 4567 7890 1234
>
> On the search side I would not know how many credit card fields  got
> created
> for a document and so I need that to be dynamic.
>
> -g
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>  Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23,

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-25 Thread Gunaranjan Chandraraju


Paul
Its not just about merging the fields or resource usage.  If you look  
at the scenario below, the issue is that it mixes up my fields  
(shipping and billing address) for instance.  I can't merge them and  
still keep the 'distinction' for search.Your case is a  
'generalization' field.  Thus the search will work.   I know mine is a  
trivial example and can be overcome by just two fields  
(shipping_address & billing_address  - but can I am talking of cases  
when we have many such 'groups of fields').


In general such one to many relationship for indices in a 'document'  
is also really really common :).  Again I am not trying to argue a  
point - I would be happy to get some idea on how to do it and be  
corrected if I'm wrong.


Lastly (while thats not my worry point right now), I tend to be  
careful with resources. When dealing with very large data, I will  
avoid any unnecessary overhead as-far-as-possible and take every  
optimization I get :)


Guna

On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:


Guna,

it's really really normal to duplicate stuffs to be merged into a  
field.


We do this all the time, for example to have a field "text-in-any- 
language" while a field "text-in-english" is also there and the  
queries boost matches in text-in-any-language less than text-in- 
english (if user is in english).


This difference in weighting is the gold of Lucene I feel (of  
retrieval generally).
Also, depending on the field you make different indexing, while  
still copying it in solr (for example use a different analyzer per  
language).


paul

PS: don't be scared with resources, this is the side of the world  
where the resource is the least the problem! (typically a "catch-all- 
field" wouldn't be stored though as this would then load the memory).



Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :


Thanks
This sounds redundant to me - to store the fields separately and  
then concat all of them to one copy field again.


My XML is like this


I am currently using XPATH or XSL to separate them into individual  
indexed fields like: address_state_1, address_type_1 etc. in SOLR.


From what you say, it looks to me that I might as well just treat  
the entire address as a single 'text field' and search within the  
text after tokenizing.  This way I don't need to have the _1, _2 as  
the single text field will contain the information together (and  
thus grouped - so I know which is shipping/billing etc?).Will  
there be any performance difference between this and the copy field  
approach?


Is there no other way (programmatic) to search across multiple  
fields?  I did take a quick look at dismax but again it needs the  
field names to be specifically mentioned in the config file or in  
the query.  I can't do this as I am not able to predict the number  
of fields (e.g. credit cards a person can have?).


I like SOLR, but to me, this seems to be a very common and simple  
search scenario/pattern - however its implementation in SOLR is  
appearing to be not very straightforward.   (My apologies, if I on  
the wrong track here because I don't understand SOLR well.  )


Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


for searching you need to put them in a single field . use  


in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
 wrote:
I make this approach work with XPATH and XSL.   However, this  
approach

creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not  
seem to
support wild cards in field names.   For e.g. I cant seem to do  
this ->

credit_card*:1234 4567 7890 1234

On the search side I would not know how many credit card fields   
got created

for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

Oops, one more gotcha. The dynamic field support is only in 1.4  
trunk.


On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:










I have setup my DIH to treat these as entities as below





  
  

  
   
   
   
   
  
 






I think the only way is to create a dynamic field for each  
attribute
(street, state etc.). Write a transformer to copy the fields  
from your

data
config to appropriately named dynamic field (e.g. street_1,  
state_1,

etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.







--
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-25 Thread Gunaranjan Chandraraju


Thanks
Much appreciate the guidance. I think I will go with the single field  
approach for now.  Also will take a look at the URL below and come  
back if I have any ideas.



Guna
On Jan 25, 2009, at 12:49 AM, Shalin Shekhar Mangar wrote:


On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:


Thanks
This sounds redundant to me - to store the fields separately and then
concat all of them to one copy field again.



Sometimes that may be the only way. For example, if you want to  
facet on

some of those fields, as well as to search them all.




My XML is like this


I am currently using XPATH or XSL to separate them into individual  
indexed

fields like: address_state_1, address_type_1 etc. in SOLR.

From what you say, it looks to me that I might as well just treat the
entire address as a single 'text field' and search within the text  
after
tokenizing.  This way I don't need to have the _1, _2 as the single  
text
field will contain the information together (and thus grouped - so  
I know
which is shipping/billing etc?).Will there be any performance  
difference

between this and the copy field approach?



No I think, one field may even be better since you are creating less  
number
of fields. If you never need to do faceting and you don't want to  
get the

contents of each address field separately. This is your best option.




Is there no other way (programmatic) to search across multiple  
fields?  I
did take a quick look at dismax but again it needs the field names  
to be
specifically mentioned in the config file or in the query.  I can't  
do this
as I am not able to predict the number of fields (e.g. credit cards  
a person

can have?).

I like SOLR, but to me, this seems to be a very common and simple  
search
scenario/pattern - however its implementation in SOLR is appearing  
to be not
very straightforward.   (My apologies, if I on the wrong track here  
because

I don't understand SOLR well.  )



There had been some discussion on having wildcards in field names.  
But I
guess nobody contributed (or had the need?) for the complete  
proposal. Copy

Fields give a lot of flexibility which is what most people use.

http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

--
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-25 Thread Paul Libbrecht


Guna,

it's really really normal to duplicate stuffs to be merged into a field.

We do this all the time, for example to have a field "text-in-any- 
language" while a field "text-in-english" is also there and the  
queries boost matches in text-in-any-language less than text-in- 
english (if user is in english).


This difference in weighting is the gold of Lucene I feel (of  
retrieval generally).
Also, depending on the field you make different indexing, while still  
copying it in solr (for example use a different analyzer per language).


paul

PS: don't be scared with resources, this is the side of the world  
where the resource is the least the problem! (typically a "catch-all- 
field" wouldn't be stored though as this would then load the memory).



Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :


Thanks
This sounds redundant to me - to store the fields separately and  
then concat all of them to one copy field again.


My XML is like this


I am currently using XPATH or XSL to separate them into individual  
indexed fields like: address_state_1, address_type_1 etc. in SOLR.


From what you say, it looks to me that I might as well just treat  
the entire address as a single 'text field' and search within the  
text after tokenizing.  This way I don't need to have the _1, _2 as  
the single text field will contain the information together (and  
thus grouped - so I know which is shipping/billing etc?).Will  
there be any performance difference between this and the copy field  
approach?


Is there no other way (programmatic) to search across multiple  
fields?  I did take a quick look at dismax but again it needs the  
field names to be specifically mentioned in the config file or in  
the query.  I can't do this as I am not able to predict the number  
of fields (e.g. credit cards a person can have?).


I like SOLR, but to me, this seems to be a very common and simple  
search scenario/pattern - however its implementation in SOLR is  
appearing to be not very straightforward.   (My apologies, if I on  
the wrong track here because I don't understand SOLR well.  )


Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


for searching you need to put them in a single field . use  


in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
 wrote:
I make this approach work with XPATH and XSL.   However, this  
approach

creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not  
seem to
support wild cards in field names.   For e.g. I cant seem to do  
this ->

credit_card*:1234 4567 7890 1234

On the search side I would not know how many credit card fields   
got created

for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

Oops, one more gotcha. The dynamic field support is only in 1.4  
trunk.


On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:










I have setup my DIH to treat these as entities as below





   
   

   




   
  






I think the only way is to create a dynamic field for each  
attribute
(street, state etc.). Write a transformer to copy the fields  
from your

data
config to appropriately named dynamic field (e.g. street_1,  
state_1,

etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.







--
--Noble Paul






smime.p7s
Description: S/MIME cryptographic signature

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-25 Thread Shalin Shekhar Mangar

On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:

> Thanks
> This sounds redundant to me - to store the fields separately and then
> concat all of them to one copy field again.
>

Sometimes that may be the only way. For example, if you want to facet on
some of those fields, as well as to search them all.


>
> My XML is like this
> 
>
> I am currently using XPATH or XSL to separate them into individual indexed
> fields like: address_state_1, address_type_1 etc. in SOLR.
>
> From what you say, it looks to me that I might as well just treat the
> entire address as a single 'text field' and search within the text after
> tokenizing.  This way I don't need to have the _1, _2 as the single text
> field will contain the information together (and thus grouped - so I know
> which is shipping/billing etc?).Will there be any performance difference
> between this and the copy field approach?
>

No I think, one field may even be better since you are creating less number
of fields. If you never need to do faceting and you don't want to get the
contents of each address field separately. This is your best option.


>
> Is there no other way (programmatic) to search across multiple fields?  I
> did take a quick look at dismax but again it needs the field names to be
> specifically mentioned in the config file or in the query.  I can't do this
> as I am not able to predict the number of fields (e.g. credit cards a person
> can have?).
>
>  I like SOLR, but to me, this seems to be a very common and simple search
> scenario/pattern - however its implementation in SOLR is appearing to be not
> very straightforward.   (My apologies, if I on the wrong track here because
> I don't understand SOLR well.  )


There had been some discussion on having wildcards in field names. But I
guess nobody contributed (or had the need?) for the complete proposal. Copy
Fields give a lot of flexibility which is what most people use.

http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-25 Thread Gunaranjan Chandraraju


Thanks
This sounds redundant to me - to store the fields separately and then  
concat all of them to one copy field again.


My XML is like this


I am currently using XPATH or XSL to separate them into individual  
indexed fields like: address_state_1, address_type_1 etc. in SOLR.


From what you say, it looks to me that I might as well just treat the  
entire address as a single 'text field' and search within the text  
after tokenizing.  This way I don't need to have the _1, _2 as the  
single text field will contain the information together (and thus  
grouped - so I know which is shipping/billing etc?).Will there be  
any performance difference between this and the copy field approach?


Is there no other way (programmatic) to search across multiple  
fields?  I did take a quick look at dismax but again it needs the  
field names to be specifically mentioned in the config file or in the  
query.  I can't do this as I am not able to predict the number of  
fields (e.g. credit cards a person can have?).


 I like SOLR, but to me, this seems to be a very common and simple  
search scenario/pattern - however its implementation in SOLR is  
appearing to be not very straightforward.   (My apologies, if I on the  
wrong track here because I don't understand SOLR well.  )


Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



for searching you need to put them in a single field . use 
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
 wrote:
I make this approach work with XPATH and XSL.   However, this  
approach

creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not  
seem to
support wild cards in field names.   For e.g. I cant seem to do  
this ->

credit_card*:1234 4567 7890 1234

On the search side I would not know how many credit card fields   
got created

for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

Oops, one more gotcha. The dynamic field support is only in 1.4  
trunk.


On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:










I have setup my DIH to treat these as entities as below




 




 
 
 
 

   
 





I think the only way is to create a dynamic field for each  
attribute
(street, state etc.). Write a transformer to copy the fields from  
your

data
config to appropriately named dynamic field (e.g. street_1,  
state_1,

etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.







--
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

for searching you need to put them in a single field . use 
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
 wrote:
> I make this approach work with XPATH and XSL.   However, this approach
> creates multiple fields of like this
>
> address_state_1
> address_state_2
> ...
> address_state_10
>
> and
>
> credit_card_1
> credit_card_2
> credit_card_3
>
>
> How do I search for a credit_card.The query syntax does not seem to
> support wild cards in field names.   For e.g. I cant seem to do this ->
> credit_card*:1234 4567 7890 1234
>
> On the search side I would not know how many credit card fields  got created
> for a document and so I need that to be dynamic.
>
> -g
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>> chandrar...@apple.com> wrote:
>>>

 
  
  
  
  
 

 I have setup my DIH to treat these as entities as below

 
  
  
   >>>   baseDir="***"
   fileName=".*xml"
   rootEntity="false"
   dataSource="null" >
  >>> name="record"
 processor="XPathEntityProcessor"
 stream="false"
 forEach="/record"
 url="${f.fileAbsolutePath}">
  

  
   >>>   name="record_adr"
   processor="XPathEntityProcessor"
   stream="false"
   forEach="/record/address"
   url="${f.fileAbsolutePath}">
   >>> xpath="/record/address/@street" />
   >>> xpath="/record/address//@state" />
   >>> xpath="/record/address//@type" />
  
 
   
  
 

>>>
>>> I think the only way is to create a dynamic field for each attribute
>>> (street, state etc.). Write a transformer to copy the fields from your
>>> data
>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>> etc).
>>> To maintain this counter you will need to get/store it with
>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>
>>> I cant't think of an easier way.
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Gunaranjan Chandraraju

I make this approach work with XPATH and XSL.   However, this approach  
creates multiple fields of like this


address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not seem  
to support wild cards in field names.   For e.g. I cant seem to do  
this ->   credit_card*:1234 4567 7890 1234


On the search side I would not know how many credit card fields  got  
created for a document and so I need that to be dynamic.


-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:


Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:




 
 
 
 


I have setup my DIH to treat these as entities as below


 
 
   
  
  

  
   
   
   
   
  
 
   
 




I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from  
your data
config to appropriately named dynamic field (e.g. street_1,  
state_1, etc).

To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

Hi Fergus,
XPathEntityprocessor can read multivalued fields easily

eg

   
   
 

 ***change**

 
 

   
 
   



In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3  tags then
you get a List for each fields where the length of the
list==3. If an item is missing it will be added as a null.

ensure that the fields are marked as multiValued="true" in the
schema.xml. Otherwise it does not return List  . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: 


I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble



On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> nesting of an XPathEntityProcessor into another XPathEntityProcessor
> is possible only if a field in an xml is a filename/url .
> what is the purpose of nesting like this?
> is it because you have multiple addresses? the possible solutions are
> discussed elsewhere in this thread
>
> On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie  wrote:
>> Hello,
>>
>> I am also a newbie and was wanting to do almost the exact same thing.
>> I was planning on doing the equivalent of:-
>>
>> 
>>
>>
>>  >  baseDir="***"
>>  fileName=".*xml"
>>  rootEntity="false"
>>  dataSource="null" >
>> >   name="record"
>>   processor="XPathEntityProcessor"
>>   stream="false"
>>   rootEntity="false"***changed***
>>   forEach="/record"
>>   url="${f.fileAbsolutePath}">
>>  
>> ***change**
>> 
>>  > name="record_adr"
>> processor="XPathEntityProcessor"
>> stream="false"
>> forEach="/record/address"
>> url="${f.fileAbsolutePath}">
>>  
>>  > xpath="/record/address//@state" />
>>  
>>
>>
>>  
>>
>> 
>>
>> ID is no longer unique within Solr, There would be multiple "documents"
>> with a given ID; one for each address. You can then search on ID and get
>> the three addresses, you can also search on an address more sensibly.
>>
>> I have not been able to try this yet as other issues are still to be
>> dealt with.
>>
>> Comments?
>>
>>>Hi
>>>I may be completely off on this being new to SOLR but I am not sure
>>>how to index related groups of fields in a document and preserver
>>>their 'grouping'.   I  would appreciate any help on this.Detailed
>>>description of the problem below.
>>>
>>>I am trying to index an entity that can have multiple occurrences in
>>>the same document - e.g. Address.  The address could be Shipping,
>>>Home, Office etc.   Each address element has multiple values in it
>>>like street, state etc.Thus each address element is a group with
>>>the state and street in one address element being related to each other.
>>>
>>>It looks like this in my source xml
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>I have setup my DIH to treat these as entities as below
>>>
>>>
>>>
>>>
>>>  >>  baseDir="***"
>>>  fileName=".*xml"
>>>  rootEntity="false"
>>>  dataSource="null" >
>>> >>name="record"
>>>  processor="XPathEntityProcessor"
>>>  stream="false"
>>>  forEach="/record"
>>>url="${f.fileAbsolutePath}">
>>> 
>>>
>>> 
>>>  >>  name="record_adr"
>>>processor="XPathEntityProcessor"
>>>stream="false"
>>>forEach="/record/address"
>>>url="${f.fileAbsolutePath}">
>>>  
>>>>> xpath="/record/address//@state" />
>>>  
>>>   
>>>
>>>  
>>>
>>>
>>>
>>>
>>>The problem is as follows.  DIH seems to treat these as entities but
>>>solr seems to flatten them out on indexing to fields in a document
>>>(losing the entity part).
>>>
>>>So when I search for the an ID - in the response all the street fields
>>>are bunched to-gather, followed by all the state fields type etc.
>>>Thus I can't associate which street address corresponds to which
>>>address type in the response.
>>>
>>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>>type="Office".  This should NOT return a document since the street for
>>>the office address is "XY2" and not "XYZ1".  However when I quer

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

nesting of an XPathEntityProcessor into another XPathEntityProcessor
is possible only if a field in an xml is a filename/url .
what is the purpose of nesting like this?
is it because you have multiple addresses? the possible solutions are
discussed elsewhere in this thread

On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie  wrote:
> Hello,
>
> I am also a newbie and was wanting to do almost the exact same thing.
> I was planning on doing the equivalent of:-
>
> 
>
>
>baseDir="***"
>  fileName=".*xml"
>  rootEntity="false"
>  dataSource="null" >
>name="record"
>   processor="XPathEntityProcessor"
>   stream="false"
>   rootEntity="false"***changed***
>   forEach="/record"
>   url="${f.fileAbsolutePath}">
>  
> ***change**
> 
>   name="record_adr"
> processor="XPathEntityProcessor"
> stream="false"
> forEach="/record/address"
> url="${f.fileAbsolutePath}">
>  
>   xpath="/record/address//@state" />
>  
>
>
>  
>
> 
>
> ID is no longer unique within Solr, There would be multiple "documents"
> with a given ID; one for each address. You can then search on ID and get
> the three addresses, you can also search on an address more sensibly.
>
> I have not been able to try this yet as other issues are still to be
> dealt with.
>
> Comments?
>
>>Hi
>>I may be completely off on this being new to SOLR but I am not sure
>>how to index related groups of fields in a document and preserver
>>their 'grouping'.   I  would appreciate any help on this.Detailed
>>description of the problem below.
>>
>>I am trying to index an entity that can have multiple occurrences in
>>the same document - e.g. Address.  The address could be Shipping,
>>Home, Office etc.   Each address element has multiple values in it
>>like street, state etc.Thus each address element is a group with
>>the state and street in one address element being related to each other.
>>
>>It looks like this in my source xml
>>
>>
>>
>>
>>
>>
>>
>>
>>I have setup my DIH to treat these as entities as below
>>
>>
>>
>>
>>  >  baseDir="***"
>>  fileName=".*xml"
>>  rootEntity="false"
>>  dataSource="null" >
>> >name="record"
>>  processor="XPathEntityProcessor"
>>  stream="false"
>>  forEach="/record"
>>url="${f.fileAbsolutePath}">
>> 
>>
>> 
>>  >  name="record_adr"
>>processor="XPathEntityProcessor"
>>stream="false"
>>forEach="/record/address"
>>url="${f.fileAbsolutePath}">
>>  
>>> xpath="/record/address//@state" />
>>  
>>   
>>
>>  
>>
>>
>>
>>
>>The problem is as follows.  DIH seems to treat these as entities but
>>solr seems to flatten them out on indexing to fields in a document
>>(losing the entity part).
>>
>>So when I search for the an ID - in the response all the street fields
>>are bunched to-gather, followed by all the state fields type etc.
>>Thus I can't associate which street address corresponds to which
>>address type in the response.
>>
>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>type="Office".  This should NOT return a document since the street for
>>the office address is "XY2" and not "XYZ1".  However when I query for
>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>
>>The problem seems to be that while DIH allows 'entities' within a
>>document  the SOLR schema does not preserve them - it 'flattens' all
>>of them out as indices for the document.
>>
>>I could work around the problem by creating SOLR fields like
>>"home_address_street" and "office_address_street" and do some xpath
>>mapping.  However I don't want to do it as we can have multiple
>>'other' addresses.  Also I have other fields whose type is not easily
>>distinguished like address.
>>
>>As I mentioned being new to SOLR I might have completely goofed on a
>>way to set it up - much appreciate any direction on it. I am using
>>SOLR 1.3
>>
>>Regards,
>>Guna
>
> --
>
> ===
> Fergus McMenemie   Email:fer...@twig.me.uk
> Techmore Ltd   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Fergus McMenemie

Hello,

I am also a newbie and was wanting to do almost the exact same thing.
I was planning on doing the equivalent of:-




  
 
  
***change**
 
  
  
  
  


  



ID is no longer unique within Solr, There would be multiple "documents"
with a given ID; one for each address. You can then search on ID and get 
the three addresses, you can also search on an address more sensibly.

I have not been able to try this yet as other issues are still to be
dealt with.

Comments?

>Hi
>I may be completely off on this being new to SOLR but I am not sure  
>how to index related groups of fields in a document and preserver  
>their 'grouping'.   I  would appreciate any help on this.Detailed  
>description of the problem below.
>
>I am trying to index an entity that can have multiple occurrences in  
>the same document - e.g. Address.  The address could be Shipping,  
>Home, Office etc.   Each address element has multiple values in it  
>like street, state etc.Thus each address element is a group with  
>the state and street in one address element being related to each other.
>
>It looks like this in my source xml
>
>
>
>
>
>
>
>
>I have setup my DIH to treat these as entities as below
>
>
>
>
>baseDir="***"
>  fileName=".*xml"
>  rootEntity="false"
>  dataSource="null" >
> name="record"
>  processor="XPathEntityProcessor"
>  stream="false"
>  forEach="/record"
>url="${f.fileAbsolutePath}">
> 
>
> 
>name="record_adr"
>processor="XPathEntityProcessor"
>stream="false"
>forEach="/record/address"
>url="${f.fileAbsolutePath}">
>  
> xpath="/record/address//@state" />
>  
>   
>
>  
>
>
>
>
>The problem is as follows.  DIH seems to treat these as entities but  
>solr seems to flatten them out on indexing to fields in a document  
>(losing the entity part).
>
>So when I search for the an ID - in the response all the street fields  
>are bunched to-gather, followed by all the state fields type etc.   
>Thus I can't associate which street address corresponds to which  
>address type in the response.
>
>What seems harder is this - say I need to query on 'Street' = XYZ1 and  
>type="Office".  This should NOT return a document since the street for  
>the office address is "XY2" and not "XYZ1".  However when I query for  
>address_state:"XYZ1" and address_type:"Office" I get back this document.
>
>The problem seems to be that while DIH allows 'entities' within a  
>document  the SOLR schema does not preserve them - it 'flattens' all  
>of them out as indices for the document.
>
>I could work around the problem by creating SOLR fields like  
>"home_address_street" and "office_address_street" and do some xpath  
>mapping.  However I don't want to do it as we can have multiple  
>'other' addresses.  Also I have other fields whose type is not easily  
>distinguished like address.
>
>As I mentioned being new to SOLR I might have completely goofed on a  
>way to set it up - much appreciate any direction on it. I am using  
>SOLR 1.3
>
>Regards,
>Guna

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Shalin Shekhar Mangar

Yes Solr does. But DataImportHandler with the 1.3 release does not support
it.

However, you can use the trunk data import handler jar with Solr 1.3 if you
do not feel comfortable using Solr 1.4 trunk.

On Fri, Jan 23, 2009 at 1:36 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:

>
> I thought 1.3 supported dynamic fields in schema.xml?
>
> Guna
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>  Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>  On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>> chandrar...@apple.com> wrote:
>>>
>>>
 
  
  
  
  
 

 I have setup my DIH to treat these as entities as below

 
  
  
   >>>   baseDir="***"
   fileName=".*xml"
   rootEntity="false"
   dataSource="null" >
  >>> name="record"
 processor="XPathEntityProcessor"
 stream="false"
 forEach="/record"
 url="${f.fileAbsolutePath}">
  

  
   >>>   name="record_adr"
   processor="XPathEntityProcessor"
   stream="false"
   forEach="/record/address"
   url="${f.fileAbsolutePath}">
   >>> xpath="/record/address/@street" />
   >>> xpath="/record/address//@state" />
   >>> xpath="/record/address//@type" />
  
 
   
  
 


>>> I think the only way is to create a dynamic field for each attribute
>>> (street, state etc.). Write a transformer to copy the fields from your
>>> data
>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>> etc).
>>> To maintain this counter you will need to get/store it with
>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>
>>> I cant't think of an easier way.
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Gunaranjan Chandraraju



I thought 1.3 supported dynamic fields in schema.xml?

Guna

On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:


Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:










I have setup my DIH to treat these as entities as below




  
 
 

 
  
  
  
  
 

  





I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from  
your data
config to appropriately named dynamic field (e.g. street_1,  
state_1, etc).

To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Gunaranjan Chandraraju



I thought 1.3 supported dynamic fields in schema.xml?

Guna

On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:


Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:




 
 
 
 


I have setup my DIH to treat these as entities as below


 
 
   
  
  

  
   
   
   
   
  
 
   
 




I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from  
your data
config to appropriately named dynamic field (e.g. street_1,  
state_1, etc).

To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-22 Thread Shalin Shekhar Mangar

Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
> chandrar...@apple.com> wrote:
>
>>
>> 
>>   
>>   
>>   
>>   
>> 
>>
>> I have setup my DIH to treat these as entities as below
>>
>> 
>>   
>>   
>> > baseDir="***"
>> fileName=".*xml"
>> rootEntity="false"
>> dataSource="null" >
>>>   name="record"
>>   processor="XPathEntityProcessor"
>>   stream="false"
>>   forEach="/record"
>>   url="${f.fileAbsolutePath}">
>>
>>
>>
>> > name="record_adr"
>> processor="XPathEntityProcessor"
>> stream="false"
>> forEach="/record/address"
>> url="${f.fileAbsolutePath}">
>> >  xpath="/record/address/@street" />
>> > xpath="/record/address//@state" />
>> >  xpath="/record/address//@type" />
>>
>>   
>> 
>>   
>> 
>>
>
> I think the only way is to create a dynamic field for each attribute
> (street, state etc.). Write a transformer to copy the fields from your data
> config to appropriately named dynamic field (e.g. street_1, state_1, etc).
> To maintain this counter you will need to get/store it with
> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>
> I cant't think of an easier way.
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-22 Thread Shalin Shekhar Mangar

On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:

>
> 
>   
>   
>   
>   
> 
>
> I have setup my DIH to treat these as entities as below
>
> 
>   
>   
>  baseDir="***"
> fileName=".*xml"
> rootEntity="false"
> dataSource="null" >
>   name="record"
>   processor="XPathEntityProcessor"
>   stream="false"
>   forEach="/record"
>   url="${f.fileAbsolutePath}">
>
>
>
>  name="record_adr"
> processor="XPathEntityProcessor"
> stream="false"
> forEach="/record/address"
> url="${f.fileAbsolutePath}">
>   xpath="/record/address/@street" />
>  xpath="/record/address//@state" />
>   xpath="/record/address//@type" />
>
>   
> 
>   
> 
>

I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from your data
config to appropriately named dynamic field (e.g. street_1, state_1, etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

16 matches

Site Navigation

Mail list logo

Footer information