[jira] Commented: (SOLR-1814) select count(distinct fieldname) in SOLR

2010-03-12 Thread Marcus Herou (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844437#action_12844437
 ] 

Marcus Herou commented on SOLR-1814:


Instead of having the file attached... 
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/

Erik:
The facet counts is something else, it groups the counts based on the field 
supplied does it not? Perhaps facet.query (like you pointed out) can be used, I 
overlooked that. Never got an answer on the mailinglist so I implemented it 
instead :)

What I jave accomplished is this:

select count(distinct blog) from BlogEntries where ...somexpression...

One doc is in in this case a BlogEntry and each belongs to Blog (many-to-one). 
If this already can be accomplished in SOLR, my bad. Please tell me how.

Ted: 
Trove have two licenses GPL and ASL. I can use the ASL version if it helps. I 
only use Trove due to the efficiency, plain hashmaps can be used of course if 
it is a showstopper.



 select count(distinct fieldname) in SOLR
 

 Key: SOLR-1814
 URL: https://issues.apache.org/jira/browse/SOLR-1814
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4, 1.5, 1.6, 2.0
Reporter: Marcus Herou
 Fix For: 1.4, 1.5, 1.6, 2.0

 Attachments: CountComponent.java


 I have seen questions on the mailinglist about having the functionality for 
 counting distinct on a field. We at Tailsweep as well want to that in for 
 example our blogsearch.
 Example:
 You had 1345 hits on 244 blogs
 The 244 part is not possible in SOLR today (correct me if I am wrong). So 
 I've written a component which does this. Attaching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Abstractify FacetComponent and SimpleFacets

2010-03-12 Thread Marcus Herou
Hi, thanks I wondered if it already was incorporated or such.

Yes it is a little related to StatsComponent ( sum, avg etc) but I think
that this solves another problem (correct me if I'm wrong) since it
transforms the resulting field in a functionquery instead of counting as per
default (today). I think that the StatsComponent does something similar but
operates on the resulting facet. I hook in earlier.

I used the StatsComponent as template for another component which I call
CountComponent (
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/CountComponent.java)
which emulates the SQL equiv: select count(distinct field). Added the
patch to JIRA (https://issues.apache.org/jira/browse/SOLR-1814) That one
works with sharding as well. The problem is that one need to send the damn
entire unique hashset of field across the shards... (can get big). See that
Ted and Erik have commented now... Perhaps I have created something which
already exists... damn

Both these Components probably need to be refined for a release/merge into
Solr.

How do I move onward with these ?






On Fri, Mar 12, 2010 at 2:02 AM, Grant Ingersoll gsing...@apache.orgwrote:


 On Mar 11, 2010, at 6:30 PM, Yonik Seeley wrote:

  Interesting looking stuff Marcus!
  Seems sort of related to stat.facet (calc stats on unique facet values)
  http://wiki.apache.org/solr/StatsComponent

 And https://issues.apache.org/jira/browse/SOLR-1622

 
 
  On Thu, Mar 11, 2010 at 5:49 PM, Marcus Herou
  marcus.he...@tailsweep.com wrote:
  I have now implemented Facet with FunctionQueries it is really cool!
 Sorry
  but even though the author of SimpleFacets (Yonik) says in the javadoc
 that
  one should subclass it to leverage more functionality I did not really
 find
  that very true in this case.
 
  Hoss was actually the first author of SimpleFacets - SOLR-44 (Solr
  didn't even have built-in faceting when it came into the incubator!)
 
  -Yonik
  http://www.lucidimagination.com





-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/


[jira] Issue Comment Edited: (SOLR-1814) select count(distinct fieldname) in SOLR

2010-03-12 Thread Marcus Herou (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844437#action_12844437
 ] 

Marcus Herou edited comment on SOLR-1814 at 3/12/10 9:55 AM:
-

Instead of having the file attached... 
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/

Erik:
The facet counts is something else, it groups the counts based on the field 
supplied does it not? Perhaps facet.query (like you pointed out) can be used, I 
overlooked that. Never got an answer on the mailinglist so I implemented it 
instead :)

What I have accomplished is this:

select count(distinct blog) from BlogEntries where ...somexpression...

One doc is in in this case a BlogEntry and each belongs to Blog (many-to-one). 
If this already can be accomplished in SOLR, my bad. Please tell me how.

Ted: 
Trove have two licenses GPL and ASL. I can use the ASL version if it helps. I 
only use Trove due to the efficiency, plain hashmaps can be used of course if 
it is a showstopper.



  was (Author: marcusherou):
Instead of having the file attached... 
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/

Erik:
The facet counts is something else, it groups the counts based on the field 
supplied does it not? Perhaps facet.query (like you pointed out) can be used, I 
overlooked that. Never got an answer on the mailinglist so I implemented it 
instead :)

What I jave accomplished is this:

select count(distinct blog) from BlogEntries where ...somexpression...

One doc is in in this case a BlogEntry and each belongs to Blog (many-to-one). 
If this already can be accomplished in SOLR, my bad. Please tell me how.

Ted: 
Trove have two licenses GPL and ASL. I can use the ASL version if it helps. I 
only use Trove due to the efficiency, plain hashmaps can be used of course if 
it is a showstopper.


  
 select count(distinct fieldname) in SOLR
 

 Key: SOLR-1814
 URL: https://issues.apache.org/jira/browse/SOLR-1814
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4, 1.5, 1.6, 2.0
Reporter: Marcus Herou
 Fix For: 1.4, 1.5, 1.6, 2.0

 Attachments: CountComponent.java


 I have seen questions on the mailinglist about having the functionality for 
 counting distinct on a field. We at Tailsweep as well want to that in for 
 example our blogsearch.
 Example:
 You had 1345 hits on 244 blogs
 The 244 part is not possible in SOLR today (correct me if I am wrong). So 
 I've written a component which does this. Attaching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1814) select count(distinct fieldname) in SOLR

2010-03-12 Thread Marcus Herou (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844437#action_12844437
 ] 

Marcus Herou edited comment on SOLR-1814 at 3/12/10 10:03 AM:
--

Instead of having the file attached... 
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/

Erik:
The facet counts is something else, it groups the counts based on the field 
supplied does it not? Perhaps facet.query (like you pointed out) can be used, I 
overlooked that. Never got an answer on the mailinglist so I implemented it 
instead :)

Well the blogs is not a value it is a field of it's own.
We call it feedId and is a pointer to a row in the DB.
...
field name=feedId type=integer indexed=true stored=true 
required=true omitNorms=true /
...

What I have accomplished is this:

select count(distinct feedId) from FeedItem where ...somexpression...

One doc is in in this case a FeedItem and each belongs to Feed (many-to-one). 
If this already can be accomplished in SOLR, my bad. Please tell me how.

Ted: 
Trove have two licenses GPL and ASL. I can use the ASL version if it helps. I 
only use Trove due to the efficiency, plain hashmaps can be used of course if 
it is a showstopper.



  was (Author: marcusherou):
Instead of having the file attached... 
http://svn.tailsweep.com/opensource/solr-contrib/trunk/src/main/java/org/apache/solr/handler/component/

Erik:
The facet counts is something else, it groups the counts based on the field 
supplied does it not? Perhaps facet.query (like you pointed out) can be used, I 
overlooked that. Never got an answer on the mailinglist so I implemented it 
instead :)

What I have accomplished is this:

select count(distinct blog) from BlogEntries where ...somexpression...

One doc is in in this case a BlogEntry and each belongs to Blog (many-to-one). 
If this already can be accomplished in SOLR, my bad. Please tell me how.

Ted: 
Trove have two licenses GPL and ASL. I can use the ASL version if it helps. I 
only use Trove due to the efficiency, plain hashmaps can be used of course if 
it is a showstopper.


  
 select count(distinct fieldname) in SOLR
 

 Key: SOLR-1814
 URL: https://issues.apache.org/jira/browse/SOLR-1814
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4, 1.5, 1.6, 2.0
Reporter: Marcus Herou
 Fix For: 1.4, 1.5, 1.6, 2.0

 Attachments: CountComponent.java


 I have seen questions on the mailinglist about having the functionality for 
 counting distinct on a field. We at Tailsweep as well want to that in for 
 example our blogsearch.
 Example:
 You had 1345 hits on 244 blogs
 The 244 part is not possible in SOLR today (correct me if I am wrong). So 
 I've written a component which does this. Attaching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1814) select count(distinct fieldname) in SOLR

2010-03-12 Thread Marcus Herou (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1286#action_1286
 ] 

Marcus Herou commented on SOLR-1814:


Ted: I am an idiot about ASL. GNU Trove ( I mixed it up with something else ).

I can add code which uses Trove if available in the CP or plain Hashmaps if 
not. Think it exists some good collection utils in commons. Will look it up. 
Trove however is super.

 select count(distinct fieldname) in SOLR
 

 Key: SOLR-1814
 URL: https://issues.apache.org/jira/browse/SOLR-1814
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4, 1.5, 1.6, 2.0
Reporter: Marcus Herou
 Fix For: 1.4, 1.5, 1.6, 2.0

 Attachments: CountComponent.java


 I have seen questions on the mailinglist about having the functionality for 
 counting distinct on a field. We at Tailsweep as well want to that in for 
 example our blogsearch.
 Example:
 You had 1345 hits on 244 blogs
 The 244 part is not possible in SOLR today (correct me if I am wrong). So 
 I've written a component which does this. Attaching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



XMLWriter

2010-03-12 Thread Frank Wesemann

Hello,
I don't want to roll up all the XMLWriter issues, but stumpled upon this:
http://lucene.apache.org/solr/api/org/apache/solr/response/SolrQueryResponse.html#returnable_data
says that a Map containing any of the items in this list may be 
contained in a SolrQueryResponse and will be handled by 
QueryResponseWriters.


This is not true for (at least) Keys in Maps.
XMLWriter tries to cast any key to a String. ( There is even a comment 
on this in the source !?).

Is there a reason not to use String.valueOf( entry.getKey() ) or such?

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





[jira] Commented: (SOLR-1815) SolrJ doesn't preserve the order of facet queries returned from solr

2010-03-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844668#action_12844668
 ] 

Yonik Seeley commented on SOLR-1815:


I'll go ahead and make this change soon if there are no objections.

As it relates to SolrJ, HashMap vs LinkedHashMap for facet queries will be 
completely inconsequential.
The only potential burden here lies with the server side - is there some reason 
solr might not want to return them in order in the future?  I really can't 
think of a realistic reason why not.

 SolrJ doesn't preserve the order of facet queries returned from solr
 

 Key: SOLR-1815
 URL: https://issues.apache.org/jira/browse/SOLR-1815
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4
Reporter: Steve Radhouani
   Original Estimate: 24h
  Remaining Estimate: 24h

 Using Solrj, I wanted to sort the response of a range query based on some 
 specific labels. For instance, using the query:
 {noformat}
 facet=true
 facet.query={!key= Less than 100}[* TO 99]
 facet.query={!key=100 - 200}[100 TO 200]
 facet.query={!key=200 +}[201 TO *]
 {noformat}
 I wanted to display the response in the following order:
 {noformat}
 Less than 100 (x)
 100 - 200 (y)
 201 + (z)
 {noformat}
 independently on the values of x, y, z which are the numbers of the retrieved 
 documents for each range.
 While Solr itself produces correctly the desired order (as specified in my 
 query), SolrJ doesn't preserve it. 
 RE: Yonik, a solution could be just to change
 {code}
 _facetQuery = new HashMapString, Integer();
 ...to...
 _facetQuery = new Linked HashMapString, Integer();
 {code}
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: XMLWriter

2010-03-12 Thread Yonik Seeley
On Fri, Mar 12, 2010 at 3:37 PM, Frank Wesemann
f.wesem...@fotofinder.net wrote:
 Hello,
 I don't want to roll up all the XMLWriter issues, but stumpled upon this:
 http://lucene.apache.org/solr/api/org/apache/solr/response/SolrQueryResponse.html#returnable_data
 says that a Map containing any of the items in this list may be contained
 in a SolrQueryResponse and will be handled by QueryResponseWriters.

 This is not true for (at least) Keys in Maps.
 XMLWriter tries to cast any key to a String. ( There is even a comment on
 this in the source !?).
 Is there a reason not to use String.valueOf( entry.getKey() ) or such?

Yeah, seems like the right approach.
Any other places we missed this?

-Yonik
http://www.lucidimagination.com


Re: XMLWriter

2010-03-12 Thread Frank Wesemann



Yeah, seems like the right approach.

Good, I feared I missed sth. obvious. :-)


Any other places we missed this?
  

I'll have a look at it.
I'll also open an JIRA issue and add patches etc, if you don't mind.

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky