Re: Multiple Words in String

2011-04-03 Thread lboutros
I managed to find both documents with your two input queries .

Add this filter in your analyzer query part :



=









 

The main problem is that your query microsoft is transformed into one
single PhraseQuery which cannot match the document containing micro soft.
The PositionFilterFactory will transform the query into multiple queries.
You can activate the debug mode to see the differences.

you can see more informations here :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Words-in-String-tp2767964p2770713.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread yehosef
How can they require payment for something that was developed under the
apache license?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-Solr-and-Lucidworks-distribution-tp2474792p2771191.html
Sent from the Solr - User mailing list archive at Nabble.com.


does overwrite=false work with json

2011-04-03 Thread David Murphy
I'm doing some performance benchmarking of Solr and I started with a single big 
JSON file containing all the docs that I'm sending via curl. The results are 
fantastic - I'm achieving an indexing rate of about 44,000 docs/sec using this 
method (these are really small test docs). In the past I have used CSV and 
adding overwrite=false to the URL increased performance when doing a fresh 
reindex when I know all the document ids are unique. I tried this with the JSON 
upload, and nothing seemed to change.  Is this supposed to work with the JSON 
update handler?

Anyway, Solr is doing spectacular against the competition so far.  Keep up the 
great work!

--Dave

AW: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Wolfram Bartussek
Take Lucidworks for Solr, it's free.

Regards, Wolfram

-Ursprüngliche Nachricht-
Von: yehosef [mailto:yeho...@gmail.com] 
Gesendet: Sonntag, 3. April 2011 15:57
An: solr-user@lucene.apache.org
Betreff: Re: Difference between Solr and Lucidworks distribution

How can they require payment for something that was developed under the
apache license?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Difference-between-Solr-and-Lucidworks-di
stribution-tp2474792p2771191.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Ken Krugler

On Apr 3, 2011, at 6:56am, yehosef wrote:

 How can they require payment for something that was developed under the
 apache license?

It's the difference between free speech and free beer :)

See http://en.wikipedia.org/wiki/Gratis_versus_libre

-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g







Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Hi,

My index contains a root entity Post and a child entity Comments. Each
post can have multiple comments. data-config.xml:

document
entity name=posts transformer=TemplateTransformer
dataSource=jdbc query=

field column=post_id /
field column=post_text/
field column=person_id/
entity name=comments dataSource=jdbc query=select *
from comments where post_id = ${posts.post_id} 
field column=comment_id /
field column=comment_text /
field column=comment_person_id /
field column=comment_post_id /
   /entity
/entity
/document

The schema has all columns of comment entity as MultiValued fields and
all fields are indexed  stored. My requirement is to count the number of
comments for each post. Approach I'm taking is to query on *:* and
faceting the result on comment_post_id so that it gives the count of
comment occurred for that post.

But I'm getting incorrect result e.g. if a post has 2 comments, the
multivalued fields are populated alright but the facet count is coming as 1
(for that post_id). What else do I need to do?


Thanks,
Kaushik


Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson
Well, what is a document on the filesystem? Solr deals
with well-formed XML documents of a specific format. You
can't just stream a random file to Solr. Specifically
documents look like:
doc
  field name=blahvalue for field /field
.
.
.
doc

perhaps with an add/add.

There are ways for structured documents to be added using the
Tika libraries etc.

But before we go there, what is it you want to do? What is the
nature of your document?

Best
Erick

On Sat, Apr 2, 2011 at 12:35 PM, michael.i michael.i...@gmail.com wrote:

 Hi,
 I am new to solr so please excuse me if my question sounds basic.

 I would like to use the EmbeddedSolrServer.
 It happens that all examples I've found on the web use documents that have
 been generated dynamically such as:


 SolrServer solrServer = new EmbeddedSolrServer(container, core);
 SolrInputDocument doc = new SolrInputDocument();
 doc.addField(docText, This is a sample file);
 solrServer.add(doc);
 solrServer.commit();


 I would like to be able to load a document that is stored on the
 filesystem.
 Ideally, I would have liked to do something such as:
 SolrInputDocument doc = new SolrInputDocument(path/myDoc.txt);
 solrServer.add(doc);
 solrServer.commit();

 It does not seem possible to do such thing. Am I missing something? Are
 there some best practices with regards to referring to a document on the
 filesystem?

 Thanx!
 Michael.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2767614.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread michael.i
Hi Erick,
thanx for getting back to me.

Well, what is a document on the filesystem? Solr deals
with well-formed XML documents of a specific format.

I would like to index all kinds of documents. For a start I'll be happy to
be able to work with xml and html documents.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple Words in String

2011-04-03 Thread Erick Erickson
Is this a general question or specific? You can handle specific ones by
using synonyms.

But the general case, that is treating any two pairs of tokens as
a single pair seems fraught with unintended consequences, but
you know your problem space better than I do.

Best
Erick

On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach chrisfauerb...@gmail.comwrote:

 Good afternoon everyone!
 I am stumped, and I would love some help.I'm new to solr/lucene,
 but I have thrown myself into it, so I think I have a solid
 understanding.   Using the analysis tool in the admin interface, I see
 these words stemmed and processed as I assume they would be, so I'm
 stuck.

 In my index, I have two documents, each with a text field, and here
 are example values

 1) microsoft.com
 2) micro soft

 I want to do a search using microsoft or micro soft and find both.
 I'm using the dismax interface, the fields are properly listed in the
 config, and I can find both records, but never at the same time.
 Here's my schema.xml for my text field, any thoughts on what I can do
 to find these together?


fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1/
filter class=solr.SynonymFilterFactory
 synonyms=syn/index_synonyms.txt ignoreCase=true expand=true/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=back/
filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=back/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1/
filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/

  /analyzer
/fieldType



Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson
Hmmm, I think you're misunderstanding faceting. It's counting the
number of documents that have a particular value. So if you're
faceting on comment_post_id, there is one and only one document
with that value (assuming that the comment_post_ids are unique).
Which is what's being reported This will be quite expensive on a
large corpus, BTW.

Is your task to show the totals for *every* document in your corpus or
just the ones in a display page? Because if the latter, your app could
just count up the number of elements in the XML returned for the
multiValued comments field.

If that's not relevant, could you explain a bit more why you need this
count?

Best
Erick

On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty kaych...@gmail.comwrote:

 Hi,

 My index contains a root entity Post and a child entity Comments. Each
 post can have multiple comments. data-config.xml:

 document
entity name=posts transformer=TemplateTransformer
 dataSource=jdbc query=

field column=post_id /
field column=post_text/
field column=person_id/
entity name=comments dataSource=jdbc query=select *
 from comments where post_id = ${posts.post_id} 
field column=comment_id /
field column=comment_text /
field column=comment_person_id /
field column=comment_post_id /
   /entity
/entity
 /document

 The schema has all columns of comment entity as MultiValued fields and
 all fields are indexed  stored. My requirement is to count the number of
 comments for each post. Approach I'm taking is to query on *:* and
 faceting the result on comment_post_id so that it gives the count of
 comment occurred for that post.

 But I'm getting incorrect result e.g. if a post has 2 comments, the
 multivalued fields are populated alright but the facet count is coming as 1
 (for that post_id). What else do I need to do?


 Thanks,
 Kaushik



Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson
OK, you're still not quite on the right track. You can't just
index XML documents without transforming them into
valid Solr XML documents. Ditto for HTML.

Take a look at the ExtractingRequestHandler documentation at:
http://wiki.apache.org/solr/ExtractingRequestHandler

Here's some more documentation that might help.
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika

But at root, you have to extract the relevant info from the file in question
and
form your own valid Solr document and send *that* to Solr if you want to
do it by hand.

Or you can use the ExtractingRequestHandler to do it for you, but then you
need
to be aware that it'll do the best it can at putting meta-data information
into
the appropriate fields in your schema, but you don't have total control over
that.

Oh, and why are you using embedded Solr? The normal HTTP request process
is recommended, which you can connect to easily with SolrJ..

FWIW
Erick

On Sun, Apr 3, 2011 at 6:48 PM, michael.i michael.i...@gmail.com wrote:

 Hi Erick,
 thanx for getting back to me.

 Well, what is a document on the filesystem? Solr deals
 with well-formed XML documents of a specific format.

 I would like to index all kinds of documents. For a start I'll be happy to
 be able to work with xml and html documents.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: admin/index.jsp double submit on IE

2011-04-03 Thread Erick Erickson
Jeffery:

It's perfectly appropriate to raise a JIRA for something like this.

If you could add the steps to make this happen, that'd be great.

see: http://wiki.apache.org/solr/HowToContribute#Contributing_your_work.

If you can add a patch, that'd be even better (instructions on that page
too). You'll
find the Solr committers are quite willing to work with you on the patch.

Thanks for finding this and digging into the underlying reason!

Best
Erick

On Sat, Apr 2, 2011 at 12:39 PM, Jeffrey Chang jclal...@gmail.com wrote:

 Hi,

 I noticed /admin/index.jsp could issue a double submit on IE causing Jetty
 to error out.

 Fixed by modifying index.jsp's javascript submit to return false.

 ... queryForm.submit(); return false; ...

 Not sure if I should log a defect for this or not.

 - Jeff



Re: Multiple Words in String

2011-04-03 Thread Chris Fauerbach
It's not a specific case only ( e.g. microsoft.com),  but it's really a
multi word issue.

carwash, bookkeeper etc...

I'm ultimately looking for a schema for search and retrieve that's heavily
focused on 'names'.. these are peoples names, business names etc..   not
content like large text fields, web sites or anything like that, but
business data that I'm very succesfully receiving using dataimport
handlers...  it's these special cases that are really tripping me up .. my
business folks keep coming up with them!


Chris Fauerbach
chrisfauerb...@gmail.com


On Sun, Apr 3, 2011 at 6:51 PM, Erick Erickson erickerick...@gmail.comwrote:

 Is this a general question or specific? You can handle specific ones by
 using synonyms.

 But the general case, that is treating any two pairs of tokens as
 a single pair seems fraught with unintended consequences, but
 you know your problem space better than I do.

 Best
 Erick

 On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach chrisfauerb...@gmail.com
 wrote:

  Good afternoon everyone!
  I am stumped, and I would love some help.I'm new to solr/lucene,
  but I have thrown myself into it, so I think I have a solid
  understanding.   Using the analysis tool in the admin interface, I see
  these words stemmed and processed as I assume they would be, so I'm
  stuck.
 
  In my index, I have two documents, each with a text field, and here
  are example values
 
  1) microsoft.com
  2) micro soft
 
  I want to do a search using microsoft or micro soft and find both.
  I'm using the dismax interface, the fields are properly listed in the
  config, and I can find both records, but never at the same time.
  Here's my schema.xml for my text field, any thoughts on what I can do
  to find these together?
 
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
  preserveOriginal=1/
 filter class=solr.SynonymFilterFactory
  synonyms=syn/index_synonyms.txt ignoreCase=true expand=true/
 filter class=solr.EdgeNGramFilterFactory
 minGramSize=2
  maxGramSize=15 side=front/
 filter class=solr.EdgeNGramFilterFactory
 minGramSize=2
  maxGramSize=15 side=back/
 filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory
 minGramSize=2
  maxGramSize=15 side=front/
 filter class=solr.EdgeNGramFilterFactory
 minGramSize=2
  maxGramSize=15 side=back/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
  preserveOriginal=1/
 filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/
 
   /analyzer
 /fieldType
 



Re: Multiple Words in String

2011-04-03 Thread Erick Erickson
Short form:
I think you're going down a rabbit-hole and should just
use synonyms and forget about it.

I'm particularly thinking that a general-purpose solution
that somehow breaks up or combines adjacent tokens
will have consequences that pop out other places that
you don't want and you'll have to fix *that*. I can't think
of a way to do this that wouldn't run that danger.

Long form, think of it as a sermon, it's Sunday after all.

This is the point, in my experience, where you have to ask your
business people what's it worth to you? You can handle
any case the come up similar to the examples you've shown
by adding it into your synonyms file - compressing any pair
into it's joined form (as a synonym) and be done with it. This is
a very straight-forward approach that has predictable consequences.

Or you can mess around, possibly for quite some time, trying
to find a general purpose solution that will almost inevitably
lead to unanticipated behavior that you'll then spend lots of time
trying to chase down, time you could have spent putting in
features that your users will actually notice.

Here's a test. Ask your business people to create a list of all the
pairs they want to see treated like this. If your response is any
variant of we don't have time to do that then even *they* must
not think it's very important nasty grin. And if they do, put
it in your synonyms file and be a hero

Evil thoughts aside, I'm dead serious. This is the kind of rabbit-hole
that development efforts go down that, in all probability, add almost
zero *value* to the product. There's a way to handle 95% of the cases
that's very easy to implement. It's already there in Solr.

Historically, we in the programming field have done a very poor job
of making it clear to the business folks that every such request has
not only an implementation cost (and we all too often don't include
debugging/maintenance in that cost) but an opportunity cost. We owe it
to the business folks *and ourselves* to clearly explain to them the
cost and let them make the decision whether it's worth it. A decision
based on information. And understand that I'm not knocking the
business folks here. We haven't given them the consequences to weigh,
so how can we fault their decisions?

OK, sermon over G. I've just too often said yes, we can do that
without thinking to add and it'll cost 3 weeks of development effort.
Eventually I figured out that adding the estimate and letting the business
folks know what I wouldn't be able to get to because of that time
spent lead to Oh, never mind.

Best
Erick

P.S. Ok, it's late Sunday night and I feel like writing long, involved
responses
that aren't entirely on-topic

On Sun, Apr 3, 2011 at 9:04 PM, Chris Fauerbach chrisfauerb...@gmail.comwrote:

 It's not a specific case only ( e.g. microsoft.com),  but it's really a
 multi word issue.

 carwash, bookkeeper etc...

 I'm ultimately looking for a schema for search and retrieve that's heavily
 focused on 'names'.. these are peoples names, business names etc..   not
 content like large text fields, web sites or anything like that, but
 business data that I'm very succesfully receiving using dataimport
 handlers...  it's these special cases that are really tripping me up .. my
 business folks keep coming up with them!


 Chris Fauerbach
 chrisfauerb...@gmail.com


 On Sun, Apr 3, 2011 at 6:51 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Is this a general question or specific? You can handle specific ones by
  using synonyms.
 
  But the general case, that is treating any two pairs of tokens as
  a single pair seems fraught with unintended consequences, but
  you know your problem space better than I do.
 
  Best
  Erick
 
  On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach 
 chrisfauerb...@gmail.com
  wrote:
 
   Good afternoon everyone!
   I am stumped, and I would love some help.I'm new to solr/lucene,
   but I have thrown myself into it, so I think I have a solid
   understanding.   Using the analysis tool in the admin interface, I see
   these words stemmed and processed as I assume they would be, so I'm
   stuck.
  
   In my index, I have two documents, each with a text field, and here
   are example values
  
   1) microsoft.com
   2) micro soft
  
   I want to do a search using microsoft or micro soft and find both.
   I'm using the dismax interface, the fields are properly listed in the
   config, and I can find both records, but never at the same time.
   Here's my schema.xml for my text field, any thoughts on what I can do
   to find these together?
  
  
  fieldType name=text class=solr.TextField
   positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory
   

Re: Faceting on multivalued field

2011-04-03 Thread Chris Fauerbach
Wouldn't you want to extract your original data format from the index and then 
'count' the comments for each post ? 
I don't think facets are appropriate. 

On Apr 3, 2011, at 22:10, Kaushik Chakraborty kaych...@gmail.com wrote:

 Ok. My expectation was since comment_post_id is a MultiValued field hence
 it would appear multiple times (i.e. for each comment). And hence when I
 would facet with that field it would also give me the count of those many
 documents where comment_post_id appears.
 
 My requirement is getting total for every document i.e. finding number of
 comments per post in the whole corpus. To explain it more clearly, I'm
 getting a result xml something like this
 
 str name=post_id46/str
 str name=post_textHello World/str
 str name=person_id20/str
 arr name=comment_id
str9/str
str10/str
 /arr
 arr name=comment_person_id
   str19/str
   str2/str
 /arr
 arr name=comment_post_id
  str46/str
  str46/str
 /arr
 arr name=comment_text
   strHello - from World/str
   strHi/str
 /arr
 
 lst name=facet_fields
  lst name=comment_post_id
 *int name=461/int*
 
 I need the count to be 2 as the post 46 has 2 comments.
 
 What other way can I approach?
 
 Thanks,
 Kaushik
 
 
 On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson erickerick...@gmail.comwrote:
 
 Hmmm, I think you're misunderstanding faceting. It's counting the
 number of documents that have a particular value. So if you're
 faceting on comment_post_id, there is one and only one document
 with that value (assuming that the comment_post_ids are unique).
 Which is what's being reported This will be quite expensive on a
 large corpus, BTW.
 
 Is your task to show the totals for *every* document in your corpus or
 just the ones in a display page? Because if the latter, your app could
 just count up the number of elements in the XML returned for the
 multiValued comments field.
 
 If that's not relevant, could you explain a bit more why you need this
 count?
 
 Best
 Erick
 
 On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty kaych...@gmail.com
 wrote:
 
 Hi,
 
 My index contains a root entity Post and a child entity Comments.
 Each
 post can have multiple comments. data-config.xml:
 
 document
   entity name=posts transformer=TemplateTransformer
 dataSource=jdbc query=
 
   field column=post_id /
   field column=post_text/
   field column=person_id/
   entity name=comments dataSource=jdbc query=select *
 from comments where post_id = ${posts.post_id} 
   field column=comment_id /
   field column=comment_text /
   field column=comment_person_id /
   field column=comment_post_id /
  /entity
   /entity
 /document
 
 The schema has all columns of comment entity as MultiValued fields
 and
 all fields are indexed  stored. My requirement is to count the number of
 comments for each post. Approach I'm taking is to query on *:* and
 faceting the result on comment_post_id so that it gives the count of
 comment occurred for that post.
 
 But I'm getting incorrect result e.g. if a post has 2 comments, the
 multivalued fields are populated alright but the facet count is coming as
 1
 (for that post_id). What else do I need to do?
 
 
 Thanks,
 Kaushik
 
 


Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson
Why not count them on the way in and just store that number along
with the original e-mail?

Best
Erick

On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty kaych...@gmail.comwrote:

 Ok. My expectation was since comment_post_id is a MultiValued field hence
 it would appear multiple times (i.e. for each comment). And hence when I
 would facet with that field it would also give me the count of those many
 documents where comment_post_id appears.

 My requirement is getting total for every document i.e. finding number of
 comments per post in the whole corpus. To explain it more clearly, I'm
 getting a result xml something like this

 str name=post_id46/str
 str name=post_textHello World/str
 str name=person_id20/str
 arr name=comment_id
str9/str
str10/str
 /arr
 arr name=comment_person_id
   str19/str
   str2/str
 /arr
 arr name=comment_post_id
  str46/str
  str46/str
 /arr
 arr name=comment_text
   strHello - from World/str
   strHi/str
 /arr

 lst name=facet_fields
  lst name=comment_post_id
 *int name=461/int*

 I need the count to be 2 as the post 46 has 2 comments.

  What other way can I approach?

 Thanks,
 Kaushik


 On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  Hmmm, I think you're misunderstanding faceting. It's counting the
  number of documents that have a particular value. So if you're
  faceting on comment_post_id, there is one and only one document
  with that value (assuming that the comment_post_ids are unique).
  Which is what's being reported This will be quite expensive on a
  large corpus, BTW.
 
  Is your task to show the totals for *every* document in your corpus or
  just the ones in a display page? Because if the latter, your app could
  just count up the number of elements in the XML returned for the
  multiValued comments field.
 
  If that's not relevant, could you explain a bit more why you need this
  count?
 
  Best
  Erick
 
  On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty kaych...@gmail.com
  wrote:
 
   Hi,
  
   My index contains a root entity Post and a child entity Comments.
  Each
   post can have multiple comments. data-config.xml:
  
   document
  entity name=posts transformer=TemplateTransformer
   dataSource=jdbc query=
  
  field column=post_id /
  field column=post_text/
  field column=person_id/
  entity name=comments dataSource=jdbc query=select
 *
   from comments where post_id = ${posts.post_id} 
  field column=comment_id /
  field column=comment_text /
  field column=comment_person_id /
  field column=comment_post_id /
 /entity
  /entity
   /document
  
   The schema has all columns of comment entity as MultiValued fields
  and
   all fields are indexed  stored. My requirement is to count the number
 of
   comments for each post. Approach I'm taking is to query on *:* and
   faceting the result on comment_post_id so that it gives the count of
   comment occurred for that post.
  
   But I'm getting incorrect result e.g. if a post has 2 comments, the
   multivalued fields are populated alright but the facet count is coming
 as
  1
   (for that post_id). What else do I need to do?
  
  
   Thanks,
   Kaushik