Re: SOLR - Documents with large number of fields ~ 450

2013-03-28 Thread Marcin Rzewucki
Hi John,

Mark is right. DocValues can be enabled in two ways: RAM resident (default)
or on-disk. You can read more here:
http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues

Regards.

On 22 March 2013 16:55, John Nielsen j...@mcb.dk wrote:

 with the on disk option.

 Could you elaborate on that?
 Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com:

  You might try using docvalues with the on disk option and try and let the
  OS manage all the memory needed for all the faceting/sorting. This would
  require Solr 4.2.
 
  - Mark
 
  On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:
 
   Hello All,
  
   Scenario:
  
   My data model consist of approx. 450 fields with different types of
  data. We
   want to include each field for indexing as a result it will create a
  single
   SOLR document with *450 fields*. The total of number of records in the
  data
   set is *755K*. We will be using the features like faceting and sorting
 on
   approx. 50 fields.
  
   We are planning to use SOLR 4.1. Following is the hardware
 configuration
  of
   the web server that we plan to install SOLR on:-
  
   CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
  
   Questions :
  
   1)What's the best approach when dealing with documents with large
 number
  of
   fields. What's the drawback of having a single document with a very
 large
   number of fields. Does SOLR support documents with large number of
  fields as
   in my case?
  
   2)Will there be any performance issue if i define all of the 450 fields
  for
   indexing? Also if faceting is done on 50 fields with document having
  large
   number of fields and huge number of records?
  
   3)The name of the fields in the data set are quiet lengthy around 60
   characters. Will it be a problem defining fields with such a huge name
 in
   the schema file? Is there any best practice to be followed related to
  naming
   convention? Will big field names create problem during querying?
  
   Thanks!
  
  
  
   --
   View this message in context:
 
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread Marcin Rzewucki
Hi,

I have a collection with more than 4K fields, but mostly Trie*Fields types.
It is used for faceting,sorting,searching and statsComponent. It works
pretty fine on Amazon 4xm1.large (7.5GB RAM) EC2 boxes. I'm using
SolrCloud, multi A-Z setup and ephemeral storage. Index is managed by mmap,
4GB for Java heap, CMS for GC. Currently there is 800K records, but will be
about 2m. Queries response is much longer (couple to dozen of seconds)
during bulk loading, but this is rather typical as I think. Indexing takes
much much longer than in case of records with less number of fields. I'm
sending updates in 5MB batches. No OOM issues.

Regarding DocValues: I believe they are great improvement for faceting, but
they are annoying because of their limitations: as far as I checked a field
has to be required or to have default value which is not possible in my
case (I can't set some figures to 0 by default as it may impact other
results displayed to the end user, which is not good). I wish it could
change.

Regards.

On 21 March 2013 07:56, kobe.free.wo...@gmail.com kobe.free.wo...@gmail.com
 wrote:

 Hello All,

 Scenario:

 My data model consist of approx. 450 fields with different types of data.
 We
 want to include each field for indexing as a result it will create a single
 SOLR document with *450 fields*. The total of number of records in the data
 set is *755K*. We will be using the features like faceting and sorting on
 approx. 50 fields.

 We are planning to use SOLR 4.1. Following is the hardware configuration of
 the web server that we plan to install SOLR on:-

 CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

 Questions :

 1)What's the best approach when dealing with documents with large number of
 fields. What's the drawback of having a single document with a very large
 number of fields. Does SOLR support documents with large number of fields
 as
 in my case?

 2)Will there be any performance issue if i define all of the 450 fields for
 indexing? Also if faceting is done on 50 fields with document having large
 number of fields and huge number of records?

 3)The name of the fields in the data set are quiet lengthy around 60
 characters. Will it be a problem defining fields with such a huge name in
 the schema file? Is there any best practice to be followed related to
 naming
 convention? Will big field names create problem during querying?

 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread John Nielsen
with the on disk option.

Could you elaborate on that?
Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com:

 You might try using docvalues with the on disk option and try and let the
 OS manage all the memory needed for all the faceting/sorting. This would
 require Solr 4.2.

 - Mark

 On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:

  Hello All,
 
  Scenario:
 
  My data model consist of approx. 450 fields with different types of
 data. We
  want to include each field for indexing as a result it will create a
 single
  SOLR document with *450 fields*. The total of number of records in the
 data
  set is *755K*. We will be using the features like faceting and sorting on
  approx. 50 fields.
 
  We are planning to use SOLR 4.1. Following is the hardware configuration
 of
  the web server that we plan to install SOLR on:-
 
  CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
 
  Questions :
 
  1)What's the best approach when dealing with documents with large number
 of
  fields. What's the drawback of having a single document with a very large
  number of fields. Does SOLR support documents with large number of
 fields as
  in my case?
 
  2)Will there be any performance issue if i define all of the 450 fields
 for
  indexing? Also if faceting is done on 50 fields with document having
 large
  number of fields and huge number of records?
 
  3)The name of the fields in the data set are quiet lengthy around 60
  characters. Will it be a problem defining fields with such a huge name in
  the schema file? Is there any best practice to be followed related to
 naming
  convention? Will big field names create problem during querying?
 
  Thanks!
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
  Sent from the Solr - User mailing list archive at Nabble.com.




SOLR - Documents with large number of fields ~ 450

2013-03-21 Thread kobe.free.wo...@gmail.com
Hello All,

Scenario:

My data model consist of approx. 450 fields with different types of data. We
want to include each field for indexing as a result it will create a single
SOLR document with *450 fields*. The total of number of records in the data
set is *755K*. We will be using the features like faceting and sorting on
approx. 50 fields.

We are planning to use SOLR 4.1. Following is the hardware configuration of
the web server that we plan to install SOLR on:-

CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

Questions :

1)What's the best approach when dealing with documents with large number of
fields. What's the drawback of having a single document with a very large
number of fields. Does SOLR support documents with large number of fields as
in my case?

2)Will there be any performance issue if i define all of the 450 fields for
indexing? Also if faceting is done on 50 fields with document having large
number of fields and huge number of records?

3)The name of the fields in the data set are quiet lengthy around 60
characters. Will it be a problem defining fields with such a huge name in
the schema file? Is there any best practice to be followed related to naming
convention? Will big field names create problem during querying?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - Documents with large number of fields ~ 450

2013-03-21 Thread Jack Krupansky
You will definitely be pushing the limits for reasonable performance. Maybe 
4-5 years from now you will be able to get decent performance with hundreds 
of fields and dozens of faceted fields, but I'd be surprised if you could 
get decent performance with more than about 100 fields and a dozen facets.


The length of a field name should not be a problem for queries other than 
readability. Just be sure to stick with Java-style names (alpha, digit, 
underscore).


The bottom line: Do a proof of concept (POC) first - and tell us how it 
performs.


-- Jack Krupansky

-Original Message- 
From: kobe.free.wo...@gmail.com

Sent: Thursday, March 21, 2013 2:56 AM
To: solr-user@lucene.apache.org
Subject: SOLR - Documents with large number of fields ~ 450

Hello All,

Scenario:

My data model consist of approx. 450 fields with different types of data. We
want to include each field for indexing as a result it will create a single
SOLR document with *450 fields*. The total of number of records in the data
set is *755K*. We will be using the features like faceting and sorting on
approx. 50 fields.

We are planning to use SOLR 4.1. Following is the hardware configuration of
the web server that we plan to install SOLR on:-

CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

Questions :

1)What's the best approach when dealing with documents with large number of
fields. What's the drawback of having a single document with a very large
number of fields. Does SOLR support documents with large number of fields as
in my case?

2)Will there be any performance issue if i define all of the 450 fields for
indexing? Also if faceting is done on 50 fields with document having large
number of fields and huge number of records?

3)The name of the fields in the data set are quiet lengthy around 60
characters. Will it be a problem defining fields with such a huge name in
the schema file? Is there any best practice to be followed related to naming
convention? Will big field names create problem during querying?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SOLR - Documents with large number of fields ~ 450

2013-03-21 Thread Otis Gospodnetic
Hi,

In short, I suspect you'll OOM if you sort and facet on all these fields.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 2:56 AM, kobe.free.wo...@gmail.com 
kobe.free.wo...@gmail.com wrote:

 Hello All,

 Scenario:

 My data model consist of approx. 450 fields with different types of data.
 We
 want to include each field for indexing as a result it will create a single
 SOLR document with *450 fields*. The total of number of records in the data
 set is *755K*. We will be using the features like faceting and sorting on
 approx. 50 fields.

 We are planning to use SOLR 4.1. Following is the hardware configuration of
 the web server that we plan to install SOLR on:-

 CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

 Questions :

 1)What's the best approach when dealing with documents with large number of
 fields. What's the drawback of having a single document with a very large
 number of fields. Does SOLR support documents with large number of fields
 as
 in my case?

 2)Will there be any performance issue if i define all of the 450 fields for
 indexing? Also if faceting is done on 50 fields with document having large
 number of fields and huge number of records?

 3)The name of the fields in the data set are quiet lengthy around 60
 characters. Will it be a problem defining fields with such a huge name in
 the schema file? Is there any best practice to be followed related to
 naming
 convention? Will big field names create problem during querying?

 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - Documents with large number of fields ~ 450

2013-03-21 Thread Mark Miller
You might try using docvalues with the on disk option and try and let the OS 
manage all the memory needed for all the faceting/sorting. This would require 
Solr 4.2.

- Mark

On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:

 Hello All,
 
 Scenario:
 
 My data model consist of approx. 450 fields with different types of data. We
 want to include each field for indexing as a result it will create a single
 SOLR document with *450 fields*. The total of number of records in the data
 set is *755K*. We will be using the features like faceting and sorting on
 approx. 50 fields.
 
 We are planning to use SOLR 4.1. Following is the hardware configuration of
 the web server that we plan to install SOLR on:-
 
 CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
 
 Questions :
 
 1)What's the best approach when dealing with documents with large number of
 fields. What's the drawback of having a single document with a very large
 number of fields. Does SOLR support documents with large number of fields as
 in my case?
 
 2)Will there be any performance issue if i define all of the 450 fields for
 indexing? Also if faceting is done on 50 fields with document having large
 number of fields and huge number of records?
 
 3)The name of the fields in the data set are quiet lengthy around 60
 characters. Will it be a problem defining fields with such a huge name in
 the schema file? Is there any best practice to be followed related to naming
 convention? Will big field names create problem during querying?
 
 Thanks!
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
 Sent from the Solr - User mailing list archive at Nabble.com.