Re: Faceting on multivalued field

2011-04-04 Thread Kaushik Chakraborty
Are you implying to change the DB query of the nested entity which fetches
the comments (query is in my post) or something can be done during the index
like using Transformers etc. ?

Thanks,
Kaushik


On Mon, Apr 4, 2011 at 8:07 AM, Erick Erickson wrote:

> Why not count them on the way in and just store that number along
> with the original e-mail?
>
> Best
> Erick
>
> On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty  >wrote:
>
> > Ok. My expectation was since "comment_post_id" is a MultiValued field
> hence
> > it would appear multiple times (i.e. for each comment). And hence when I
> > would facet with that field it would also give me the count of those many
> > documents where comment_post_id appears.
> >
> > My requirement is getting total for every document i.e. finding number of
> > comments per post in the whole corpus. To explain it more clearly, I'm
> > getting a result xml something like this
> >
> > 46
> > Hello World
> > 20
> > 
> >9
> >10
> > 
> > 
> >   19
> >   2
> > 
> > 
> >  46
> >  46
> > 
> > 
> >   Hello - from World
> >   Hi
> > 
> >
> > 
> >  
> > *1*
> >
> > I need the count to be 2 as the post 46 has 2 comments.
> >
> >  What other way can I approach?
> >
> > Thanks,
> > Kaushik
> >
> >
> > On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson  > >wrote:
> >
> > > Hmmm, I think you're misunderstanding faceting. It's counting the
> > > number of documents that have a particular value. So if you're
> > > faceting on "comment_post_id", there is one and only one document
> > > with that value (assuming that the comment_post_ids are unique).
> > > Which is what's being reported This will be quite expensive on a
> > > large corpus, BTW.
> > >
> > > Is your task to show the totals for *every* document in your corpus or
> > > just the ones in a display page? Because if the latter, your app could
> > > just count up the number of elements in the XML returned for the
> > > multiValued comments field.
> > >
> > > If that's not relevant, could you explain a bit more why you need this
> > > count?
> > >
> > > Best
> > > Erick
> > >
> > > On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty <
> kaych...@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > My index contains a root entity "Post" and a child entity "Comments".
> > > Each
> > > > post can have multiple comments. data-config.xml:
> > > >
> > > > 
> > > > > > > dataSource="jdbc" query="">
> > > >
> > > >
> > > >
> > > >
> > > > query="select
> > *
> > > > from comments where post_id = ${posts.post_id}" >
> > > >
> > > >
> > > >
> > > >
> > > >   
> > > >
> > > > 
> > > >
> > > > The schema has all columns of "comment" entity as "MultiValued"
> fields
> > > and
> > > > all fields are indexed & stored. My requirement is to count the
> number
> > of
> > > > comments for each post. Approach I'm taking is to query on "*:*" and
> > > > faceting the result on "comment_post_id" so that it gives the count
> of
> > > > comment occurred for that post.
> > > >
> > > > But I'm getting incorrect result e.g. if a post has 2 comments, the
> > > > multivalued fields are populated alright but the facet count is
> coming
> > as
> > > 1
> > > > (for that post_id). What else do I need to do?
> > > >
> > > >
> > > > Thanks,
> > > > Kaushik
> > > >
> > >
> >
>


Re: Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Ok. My expectation was since "comment_post_id" is a MultiValued field hence
it would appear multiple times (i.e. for each comment). And hence when I
would facet with that field it would also give me the count of those many
documents where comment_post_id appears.

My requirement is getting total for every document i.e. finding number of
comments per post in the whole corpus. To explain it more clearly, I'm
getting a result xml something like this

46
Hello World
20

9
10


   19
   2


  46
  46


   Hello - from World
   Hi



  
 *1*

I need the count to be 2 as the post 46 has 2 comments.

 What other way can I approach?

Thanks,
Kaushik


On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson wrote:

> Hmmm, I think you're misunderstanding faceting. It's counting the
> number of documents that have a particular value. So if you're
> faceting on "comment_post_id", there is one and only one document
> with that value (assuming that the comment_post_ids are unique).
> Which is what's being reported This will be quite expensive on a
> large corpus, BTW.
>
> Is your task to show the totals for *every* document in your corpus or
> just the ones in a display page? Because if the latter, your app could
> just count up the number of elements in the XML returned for the
> multiValued comments field.
>
> If that's not relevant, could you explain a bit more why you need this
> count?
>
> Best
> Erick
>
> On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty  >wrote:
>
> > Hi,
> >
> > My index contains a root entity "Post" and a child entity "Comments".
> Each
> > post can have multiple comments. data-config.xml:
> >
> > 
> > > dataSource="jdbc" query="">
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >   
> >
> > 
> >
> > The schema has all columns of "comment" entity as "MultiValued" fields
> and
> > all fields are indexed & stored. My requirement is to count the number of
> > comments for each post. Approach I'm taking is to query on "*:*" and
> > faceting the result on "comment_post_id" so that it gives the count of
> > comment occurred for that post.
> >
> > But I'm getting incorrect result e.g. if a post has 2 comments, the
> > multivalued fields are populated alright but the facet count is coming as
> 1
> > (for that post_id). What else do I need to do?
> >
> >
> > Thanks,
> > Kaushik
> >
>


Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Hi,

My index contains a root entity "Post" and a child entity "Comments". Each
post can have multiple comments. data-config.xml:












   



The schema has all columns of "comment" entity as "MultiValued" fields and
all fields are indexed & stored. My requirement is to count the number of
comments for each post. Approach I'm taking is to query on "*:*" and
faceting the result on "comment_post_id" so that it gives the count of
comment occurred for that post.

But I'm getting incorrect result e.g. if a post has 2 comments, the
multivalued fields are populated alright but the facet count is coming as 1
(for that post_id). What else do I need to do?


Thanks,
Kaushik


Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Kaushik Chakraborty
The query's there in the data-config.xml. And the query's fetching as
expected from the database.

Thanks,
Kaushik


On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty  wrote:

> On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
>  wrote:
> > Kaushik,
> >
> > i just remembered an ML-Post few weeks ago .. same problem while
> > importing geo-data
> > (
> http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html
> )
> > - the solution was:
> >
> >> CAST( CONCAT( lat, ',', lng ) AS CHAR )
> >
> > at that time i search a little bit for the reason and afaik there was
> > a bug in mysql/jdbc which produces that binary output under certain
> > conditions
> [...]
>
> As Stefan mentions, there might be a way to solve this.
>
> Could you show us the query in DIH that you are using
> when you get this BLOB, i.e., the SELECT statement
> that goes to the database?
>
> It might also be instructive for you to try that same
> SELECT directly in a mysql interface.
>
> Regards,
> Gora
>


SOLR DIH importing MySQL "text" column as a BLOB

2011-03-15 Thread Kaushik Chakraborty
I've a column for posts in MySQL of type `text`, I've tried corresponding
`field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
whenever I'm importing it using the DIH, it's getting imported as a BLOB
object. I checked, this thing is happening only for columns of type `text`
and not for `varchar`(they are getting indexed as string). Hence, the posts
field is not becoming searchable.

I found about this issue, after repeated search failures, when I did a `*:*`
query search on Solr. A sample response:



1.0
[B@10a33ce2
2011-02-21T07:02:55Z
test.acco...@gmail.com
Test
Account
[B@2c93c4f1
1


The `data-config.xml` :


 
 
 
 
 
 
 
 
 
   
  

The `schema.xml` :



 
 
 
 
 
 

solr_post_status_message_id
solr_post_message


Thanks,
Kaushik