Re: Storing Json field in Lucene

2020-04-22 Thread Erick Erickson
"Is it good idea to store complete Json as string to Lucene DB. If we store as 
separate fields then we have around 30 fields. There will be 30 seeks to get 
complete stored fields”

This is not true. Under the covers, all the stored fields are compressed and 
stored as a blob and Lucene does the magic of un-compressing that blob and 
extracting the stored field when you ask for it.

Further, while you’re right that storing lots of things will bloat the index, 
that’s not very important. Stored data is kept in separate files (*.fdx) in 
each segment and has little to no impact on search performance. That data is 
not accessed unless you ask for the field to be returned, i.e. it’s not part of 
the data used to get the top N documents. Say you have a search that has 
10,000,000 hits and return the top 10. _Only_ the stored data for those top 10 
hits is accessed, and that only after all the scoring is done.

I think this is premature optimization, try using the least-complex way 
organizing your data and measure.

Best,
Erick

> On Apr 22, 2020, at 1:00 AM, ganesh m  wrote:
> 
> Is it good idea to store complete Json as string to Lucene DB. If we store as 
> separate fields then we have around 30 fields. There will be 30 seeks to get 
> complete stored fields


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing Json field in Lucene

2020-04-21 Thread Aditya Varun Chadha
during indexing, you can add the json string to a stored-only field (not
indexed, not doc-values) to each document.

at query time you can then retrieve the json field's value only for the top
K results. this field should not be used for matching or scoring.

the point is that if you do ever want to Lucene for its strengths
(text/multidimensional indexing and search), you should extract those
values from your json document (like you extract Type and id, i guess) and
_also_ add them as separate Fields with indexing/doc-values enabled,
depending on the use-cases for that field.

On Wed, Apr 22, 2020 at 7:01 AM ganesh m 
wrote:

> Hi
> I am currently storing indexed field and stored field in separate
> database. In stored field database, Document Id, Type and Json string of
> metadata will be stored. Basically i am using it as key-value pair
> database. For every document to be indexed, we have three different
> metadata structure to be stored. That is the reason, we have Document Id
> and Type, so that we can query and retrieve stored field based on type. We
> have to depend on Lucene as we don't have any other database to store data.
>
> Is it good idea to store complete Json as string to Lucene DB. If we store
> as separate fields then we have around 30 fields. There will be 30 seeks to
> get complete stored fields. If we store it as Json then it is a one seek to
> retrieve the data. Since it is Json, field name and its value will be
> stored for every record and it may bloat index size.
>
> Could you guide me what is the better approach. To store as Json or as
> individual fields.
>
> RegardsGanesh
>


-- 
Aditya Varun Chadha | http://www.adichad.net | +49 (0) 152 25914008 (M)


Storing Json field in Lucene

2020-04-21 Thread ganesh m
Hi
I am currently storing indexed field and stored field in separate database. In 
stored field database, Document Id, Type and Json string of metadata will be 
stored. Basically i am using it as key-value pair database. For every document 
to be indexed, we have three different metadata structure to be stored. That is 
the reason, we have Document Id and Type, so that we can query and retrieve 
stored field based on type. We have to depend on Lucene as we don't have any 
other database to store data. 

Is it good idea to store complete Json as string to Lucene DB. If we store as 
separate fields then we have around 30 fields. There will be 30 seeks to get 
complete stored fields. If we store it as Json then it is a one seek to 
retrieve the data. Since it is Json, field name and its value will be stored 
for every record and it may bloat index size.  
 
Could you guide me what is the better approach. To store as Json or as 
individual fields. 

RegardsGanesh