from:"Peter Karich"

[jira] [Commented] (LUCENE-2228) AES Encrypted Directory

2014-01-03 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861923#comment-13861923
 ] 

Peter Karich commented on LUCENE-2228:
--

What is the state here?

 AES Encrypted Directory
 ---

 Key: LUCENE-2228
 URL: https://issues.apache.org/jira/browse/LUCENE-2228
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Affects Versions: 3.1
Reporter: Jay Mundrawala
 Attachments: LUCENE-2228.patch, lucene-encryption.tar.gz


 Provides an encryption solution for Lucene indexes, using the AES encryption 
 algorithm.
 You must have the JCE Unlimited Strength Jurisdiction Policy Files 6 Release 
 Candidate which
 you can get from java.sun.com.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

2011-01-18 Thread Peter Karich

 Why not vote for or against 'maven artifacts'?

http://www.doodle.com/2qp35b42vstivhvx

I'm using lucene+solr a lot times via maven.
Elasticsearch uses lucene via gradle.
Solandra uses lucene via ivy and so on ;)
So maven artifacts are not only very handy for maven folks.
But I think no artifacts would be better than broken ones.

Why not trying to 'switch' to ivy build system? It's ant but handles
dependencies better IMO.

Regards,
Peter.

 On Tue, Jan 18, 2011 at 3:49 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Jan 18, 2011, at 12:13 PM, Robert Muir wrote:

 I can't help but remind myself, this is the same argument Oracle
 offered up for the whole reason hudson debacle
 (http://hudson-labs.org/content/whos-driving-thing)

 Declaring that I have a secret pocket of users that want XYZ isn't
 open source consensus.


 You were very quick to cite your own secret pocket of users when you called 
 those who support it the vocal minority.  So, if you want to continue 
 baiting the discussion we can, but as I see it, we have committers willing 
 to support it, so what's the big deal?
 I don't think they are that secret, you can look at the last maven
 discussion and see several other committers who spoke up against it.
 they are just sick of the discussion i gather and have given up
 fighting it.

 The problem again, is the magical special artifacts.

 I dont see consensus here for maven... when you have it, get back to me.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Report of the most searched terms

2011-01-06 Thread Peter Karich

 IMO there is no inbuilt method for this.
So either use apps like piwik in your frontend or grep the logs:
http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/

Regards,
Peter.

 Hi,

 I would like to know if there is a way to get data from SOLR and display
 them as a report.
 It would be a Report of the most searched terms and should include total
 searches, total searches with result
 and total searches without results.

 How can I accomplish this?

 Thanks in advance,

 Jonilson



-- 
http://jetwick.com open twitter search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-12 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970612#action_12970612
]

Peter Karich commented on SOLR-1729:

Hi Yonik,

so, sorry for another misposting: yes, you were right. it was the wrong solr
version. it was too late yesterday :-/

All is fine now with this patch. But the
org.apache.solr.request.SolrRequestInfo class is missing or am I completely
crazy now? (I checked out solr twice and applied the patch again but it didn't
compile)

Regards,
Peter.

Date Facet now override time parameter
--

Key: SOLR-1729
URL: https://issues.apache.org/jira/browse/SOLR-1729
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: FacetParams.java, SimpleFacets.java,
solr-1.4.0-solr-1729.patch, SOLR-1729.patch, SOLR-1729.patch,
UnInvertedField.java

This PATCH introduces a new query parameter that tells a (typically, but not
necessarily) remote server what time to use as 'NOW' when calculating date
facets for a query (and, for the moment, date facets *only*) - overriding the
default behaviour of using the local server's current time.
This gets 'round a problem whereby an explicit time range is specified in a
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the
given time range (in fact, any explicit time range).
Because DateMathParser performs all its calculations from 'NOW', remote
callers have to work out how long ago 'then0' and 'then1' are from 'now', and
use the relative-to-now values in the facet.date.xxx parameters. If a remote
server has a different opinion of NOW compared to the caller, the results
will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting
(see SOLR-1709), where multiple shards may all be running with different
times, and the faceting needs to be aligned.
The new parameter is called 'facet.date.now', and takes as a parameter a
(stringified) long that is the number of milliseconds from the epoch (1 Jan
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call.
This was chosen over a formatted date to delineate it from a 'searchable'
time and to avoid superfluous date parsing. This makes the value generally a
programatically-set value, but as that is where the use-case is for this type
of parameter, this should be ok.
NOTE: This parameter affects date facet timing only. If there are other areas
of a query that rely on 'NOW', these will not interpret this value. This is a
broader issue about setting a 'query-global' NOW that all parts of query
analysis can share.
Source files affected:
FacetParams.java (holds the new constant FACET_DATE_NOW)
SimpleFacets.java getFacetDateCounts() NOW parameter modified
This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as
it's a general change for date faceting, it was deemed deserving of its own
patch. I will be updating SOLR-1709 in due course to include the use of this
new parameter, after some rfc acceptance.
A possible enhancement to this is to detect facet.date fields, look for and
match these fields in queries (if they exist), and potentially determine
automatically the required time skew, if any. There are a whole host of
reasons why this could be problematic to implement, so an explicit
facet.date.now parameter is the safest route.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-12 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970671#action_12970671
]

Peter Karich commented on SOLR-1729:

Nice, now this patch 1729 applies + compiles + run tests successfully (I'm
using rev 1044942 of trunk)

One further question: Would facet queries (with dates) work in the distributed
setup without the date-patches? To get a quick(er) workaround. because I would
need the patch for 1.4.1 (-solandra)

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-11 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970544#action_12970544
]

Peter Karich commented on SOLR-1729:

Yonik,

thanks for the update. I refreshed my sources (now trunk) to rev 1044745. But
the patch does not cleanly apply* for SearchHandler.
Am I doing something stupid here?

Regards,
Peter.

*
pathxy/solr_branch_3x$ patch -p0 SOLR-1729.patch
patching file solr/src/test/test-files/solr/conf/schema12.xml
patching file
solr/src/test/org/apache/solr/search/function/TestFunctionQuery.java
Hunk #1 succeeded at 301 (offset -17 lines).
patching file
solr/src/test/org/apache/solr/handler/component/SpellCheckComponentTest.java
patching file
solr/src/test/org/apache/solr/handler/component/TermVectorComponentTest.java
patching file solr/src/java/org/apache/solr/core/QuerySenderListener.java
patching file solr/src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 succeeded at 64 (offset -9 lines).
Hunk #2 succeeded at 620 (offset -200 lines).
Hunk #3 succeeded at 630 (offset -200 lines).
Hunk #4 succeeded at 645 (offset -200 lines).
Hunk #5 succeeded at 803 (offset -200 lines).
patching file solr/src/java/org/apache/solr/handler/component/SearchHandler.java
Hunk #1 FAILED at 192.
Hunk #2 succeeded at 255 (offset -36 lines).
1 out of 2 hunks FAILED -- saving rejects to file
solr/src/java/org/apache/solr/handler/component/SearchHandler.java.rej
patching file
solr/src/java/org/apache/solr/handler/component/ResponseBuilder.java
Hunk #2 succeeded at 67 (offset -1 lines).
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollator.java
patching file solr/src/java/org/apache/solr/util/TestHarness.java
Hunk #2 succeeded at 320 (offset -9 lines).
Hunk #3 succeeded at 335 (offset -9 lines).
patching file solr/src/java/org/apache/solr/util/DateMathParser.java
patching file solr/src/webapp/src/org/apache/solr/servlet/SolrServlet.java
patching file
solr/src/webapp/src/org/apache/solr/servlet/SolrDispatchFilter.java
Hunk #1 succeeded at 241 (offset 4 lines).
Hunk #2 succeeded at 255 (offset 4 lines).
Hunk #3 succeeded at 283 (offset 4 lines).
patching file
solr/src/webapp/src/org/apache/solr/servlet/DirectSolrConnection.java
Hunk #2 succeeded at 170 (offset -16 lines).
Hunk #3 succeeded at 185 with fuzz 1 (offset -16 lines).
patching file
solr/src/webapp/src/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java
Hunk #1 succeeded at 32 with fuzz 1 (offset -9 lines).
Hunk #2 succeeded at 138 (offset -11 lines).
Hunk #3 succeeded at 156 (offset -77 lines).

Date Facet now override time parameter
--

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-03 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966488#action_12966488
]

Peter Karich commented on SOLR-1729:

*regarding: 1.4.1*
Hmmh, today download.carrot2.org is down and I had to delete contrib/clustering
to do the build after the patch. which does not apply cleanly (strange that it
appled yesterday):

solr1.4.1$ patch -p0 solr-1.4.0-solr-1729.patch
patching file src/java/org/apache/solr/handler/component/FacetComponent.java
patching file src/java/org/apache/solr/handler/component/ResponseBuilder.java

solr1.4.1$ patch -p0 solr-1.4.0-solr-1709.patch
patching file src/java/org/apache/solr/handler/component/FacetComponent.java
Reversed (or previously applied) patch detected! Assume -R? [n] y
patching file src/java/org/apache/solr/handler/component/ResponseBuilder.java
Reversed (or previously applied) patch detected! Assume -R? [n] y
Hunk #3 succeeded at 251 (offset -1 lines).

Or is this ok?? Because then, all tests would pass ...

*regarding branch3x*
both patches do not apply cleanly. SOLR-1709 fails also without SOLR-1729

solr_branch_3x/solr$ patch -p0 solr-1.4.0-solr-1709.patch
patching file src/java/org/apache/solr/handler/component/FacetComponent.java
Hunk #1 succeeded at 240 (offset 2 lines).
Hunk #2 succeeded at 267 with fuzz 2 (offset 7 lines).
Hunk #3 FAILED at 436.
1 out of 3 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/handler/component/FacetComponent.java.rej
patching file src/java/org/apache/solr/handler/component/ResponseBuilder.java
Reversed (or previously applied) patch detected! Assume -R? [n] y
Hunk #2 FAILED at 61.
Hunk #3 FAILED at 252.
2 out of 3 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/handler/component/ResponseBuilder.java.rej

Date Facet now override time parameter
--

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-03 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966520#action_12966520
]

Peter Karich commented on SOLR-1729:

Hi Peter,

1.4.1 would be fine (I asked Jake from solandra, before I thought he uses the
trunk)

Now in my last comment I made a stupid mistake: the patches didn't cleanly
apply for 1.4.1 because I accidentially overwrote solr-1729.patch with
solr-1709 when copying from branch3x and got two identical 1709 patches :-/

So: for 1.4.1 the patches apply cleanly. But the question remains why the
following tests are failing:

Test org.apache.solr.TestTrie FAILED

Test org.apache.solr.request.SimpleFacetsTest FAILED

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-03 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966562#action_12966562
]

Peter Karich commented on SOLR-1729:

Hi Peter,

sorry for the confusion :-/

I was speaking of 1.4.1: the two patches apply. 2 tests fail.

Regards,
Peter.

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-02 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966181#action_12966181
 ] 

Peter Karich commented on SOLR-1729:


Peter Sturge,

in SOLR-1709 you said that you are working with branch3x I checked it out from 
here:
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

but this 1729 patch didn't apply cleanly*. 

When I tried the 1.4.1 release it is ok, but the tests fail due to**

What could be wrong?

Regards,
Peter.



*
solr_branch_3x/solr$ patch -p0  solr-1.4.0-solr-1729.patch 
patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 succeeded at 245 (offset 28 lines).
Hunk #2 succeeded at 280 (offset 28 lines).
Hunk #3 FAILED at 582.
Hunk #4 FAILED at 652.
2 out of 4 hunks FAILED -- saving rejects to file 
src/java/org/apache/solr/request/SimpleFacets.java.rej
patching file src/java/org/apache/solr/request/UnInvertedField.java
Hunk #2 succeeded at 40 with fuzz 1 (offset 1 line).
Hunk #3 succeeded at 440 (offset 5 lines).
Hunk #4 succeeded at 557 (offset 5 lines).
patching file src/common/org/apache/solr/common/params/FacetParams.java
Hunk #1 FAILED at 175.
1 out of 1 hunk FAILED -- saving rejects to file 
src/common/org/apache/solr/common/params/FacetParams.java.rej




**
[junit] Running org.apache.solr.TestTrie
[junit]  xml response was: ?xml version=1.0 encoding=UTF-8?
[junit] response
[junit] lst name=responseHeaderint name=status0/intint 
name=QTime157/int/lstresult name=response numFound=15 
start=0docfloat name=id0.0/floatdate 
name=tdate2010-12-02T00:00:00Z/datedouble 
name=tdouble0.0/doublefloat name=tfloat0.0/floatint 
name=tint0/intlong name=tlong2147483647/long/docdocfloat 
name=id1.0/floatdate name=tdate2010-12-03T00:00:00Z/datedouble 
name=tdouble2.33/doublefloat name=tfloat31.11/floatint 
name=tint1/intlong name=tlong2147483648/long/docdocfloat 
name=id2.0/floatdate name=tdate2010-12-04T00:00:00Z/datedouble 
name=tdouble4.66/doublefloat name=tfloat124.44/floatint 
name=tint2/intlong name=tlong2147483649/long/docdocfloat 
name=id3.0/floatdate name=tdate2010-12-05T00:00:00Z/datedouble 
name=tdouble6.99/doublefloat name=tfloat279.99/floatint 
name=tint3/intlong name=tlong2147483650/long/docdocfloat 
name=id4.0/floatdate name=tdate2010-12-06T00:00:00Z/datedouble 
name=tdouble9.32/doublefloat name=tfloat497.76/floatint 
name=tint4/intlong name=tlong2147483651/long/docdocfloat 
name=id5.0/floatdate name=tdate2010-12-07T00:00:00Z/datedouble 
name=tdouble11.65/doublefloat name=tfloat777.75/floatint 
name=tint5/intlong name=tlong2147483652/long/docdocfloat 
name=id6.0/floatdate name=tdate2010-12-08T00:00:00Z/datedouble 
name=tdouble13.98/doublefloat name=tfloat1119.96/floatint 
name=tint6/intlong name=tlong2147483653/long/docdocfloat 
name=id7.0/floatdate name=tdate2010-12-09T00:00:00Z/datedouble 
name=tdouble16.312/doublefloat 
name=tfloat1524.39/floatint name=tint7/intlong 
name=tlong2147483654/long/docdocfloat name=id8.0/floatdate 
name=tdate2010-12-10T00:00:00Z/datedouble 
name=tdouble18.64/doublefloat name=tfloat1991.04/floatint 
name=tint8/intlong name=tlong2147483655/long/docdocfloat 
name=id9.0/floatdate name=tdate2010-12-11T00:00:00Z/datedouble 
name=tdouble20.97/doublefloat name=tfloat2519.9102/floatint 
name=tint9/intlong name=tlong2147483656/long/docdocfloat 
name=id10.0/floatdate name=tdate2010-12-02T00:00:00Z/datedouble 
name=tdouble0.0/doublefloat name=tfloat0.0/floatint 
name=tint0/intlong name=tlong2147483647/long/docdocfloat 
name=id20.0/floatdate name=tdate2010-12-03T00:00:00Z/datedouble 
name=tdouble2.33/doublefloat name=tfloat31.11/floatint 
name=tint1/intlong name=tlong2147483648/long/docdocfloat 
name=id30.0/floatdate name=tdate2010-12-04T00:00:00Z/datedouble 
name=tdouble4.66/doublefloat name=tfloat124.44/floatint 
name=tint2/intlong name=tlong2147483649/long/docdocfloat 
name=id40.0/floatdate name=tdate2010-12-05T00:00:00Z/datedouble 
name=tdouble6.99/doublefloat name=tfloat279.99/floatint 
name=tint3/intlong name=tlong2147483650/long/docdocfloat 
name=id50.0/floatdate name=tdate2010-12-06T00:00:00Z/datedouble 
name=tdouble9.32/doublefloat name=tfloat497.76/floatint 
name=tint4/intlong name=tlong2147483651/long/doc/resultlst 
name=facet_countslst name=facet_queries/lst name=facet_fieldslst 
name=tintint name=02/intint name=12/intint name=22/intint 
name=32/intint name=42/intint name=51/intint 
name=61/intint name=71/intint name=81/intint 
name=91/int/lstlst name=tlongint name=21474836472/intint 
name=21474836482/intint name=21474836492/intint 
name=21474836502/intint name=21474836512/intint 
name=21474836521/intint name=21474836531/intint 
name=21474836541/intint name=21474836551/intint 
name=21474836561/int/lstlst name=tfloatint name=0.02/intint 
name=31.112/intint name=124.442/intint name=279.992/intint 
name=497.762/intint name=777.751/intint name=1119.961/intint 
name=1524.391

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-12-01 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965841#action_12965841
]

Peter Karich commented on SOLR-1709:

Hi Peter,

sorry for getting so late back.

I'm relative sure now that I'll need that patch (also Jake from solandra was
asking when this patch will be ready :-))

So, I will need to apply SOLR-1729 and then this patch to the 3x branch or even
without SOLR-1729 (not necessary in my case)?

Regards,
Peter.

Distributed Date Faceting
-

Key: SOLR-1709
URL: https://issues.apache.org/jira/browse/SOLR-1709
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: FacetComponent.java, FacetComponent.java,
ResponseBuilder.java, solr-1.4.0-solr-1709.patch

This patch is for adding support for date facets when using distributed
searches.
Date faceting across multiple machines exposes some time-based issues that
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch
(i.e. merged date facets are at a time-of-day, not necessarily at a universal
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the
basis for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to
the first by 1 'gap', these 'earlier' or 'later' facets will not be merged
in.
There are several reasons for this:
* Performance: It's faster to check facet_date lists against a single map's
data, rather than against each other, particularly if there are many shards
* If 'earlier' and/or 'later' facet_dates are added in, this will make the
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3
or more hours of data)
This could be dealt with if timezone and skew information was added, and
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and
'now' parameters to the 'facet_dates' map. This would tell requesters what
time and TZ the remote server thinks it is, and so multiple shards' time data
can be normalized.
The patch affects 2 files in the Solr core:
org.apache.solr.handler.component.FacetComponent.java
org.apache.solr.handler.component.ResponseBuilder.java
The main changes are in FacetComponent - ResponseBuilder is just to hold the
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but
really, if facet.date parameters are specified, it is assumed they are
desired.
Comments suggestions welcome.
As a favour to ask, if anyone could take my 2 source files and create a PATCH
file from it, it would be greatly appreciated, as I'm having a bit of trouble
with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: WordDelimiterFilter bug

2010-11-19 Thread Peter Karich


 Hi Robert,


 QueryGenerator^H^H^HParser


Thanks for the hint. I should have done a debugQuery=on earlier ... sorry.

But how can I get: str name=parsedquerytw:abc tw:a tw:bc
instead of: parsedqueryMultiPhraseQuery(tw:(abc a) bc)
for the query aBc ?

Regards,
Peter.


On Thu, Nov 18, 2010 at 5:26 PM, Peter Karichpeat...@yahoo.de  wrote:

  Hi,

I asked this on the user list and I think I found a bug in
...
(The strange thing is that the admin GUI will highlight it correctly)


because the admin gui highlights it, there's no bug.

the reason it doesnt match is because of QueryGenerator^H^H^HParser
automatically generating phrasequeries.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: WordDelimiterFilter bug

2010-11-19 Thread Peter Karich

Hi Robert,

thanks a lot! I will try a newer solr version for other reasons then I
will try your suggested option too!
(I will repost your solution to the user mailing list if that is ok for
you ...)

Where can I find more info about phrasequeries? I only found*
I mean, how does MultiPhraseQuery selects its documents for (tw:(abc a)
bc) ?

Regards,
Peter.

*
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solr

On Fri, Nov 19, 2010 at 5:12 AM, Peter Karichpeat...@yahoo.de wrote:

Hi Robert,

QueryGenerator^H^H^HParser

Thanks for the hint. I should have done a debugQuery=on earlier ... sorry.

But how can I get:str name=parsedquerytw:abc tw:a tw:bc
instead of: parsedqueryMultiPhraseQuery(tw:(abc a) bc)
for the query aBc ?

If you are using Solr branch_3x or trunk, you can turn this off, by
setting autoGeneratePhraseQueries to false in the fieldType.
fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=false
By enabling this option, phrase queries are only created by the
queryparser when you enclose stuff in double quotes.

If you are using an older version of solr such as 1.4.x, then you can
only hack it, by adding a PositionFilterFactory to the end of your
query analyzer.
The downside to that approach (unfortunately the only approach, for
older versions) is that it completely disables phrasequeries across
the board for that field type.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: WordDelimiterFilter bug

2010-11-19 Thread Peter Karich


 Thanks for the explanation! That makes sense :-)

Regards,
Peter.


On Fri, Nov 19, 2010 at 6:18 AM, Peter Karichpeat...@yahoo.de  wrote:

  Hi Robert,

thanks a lot! I will try a newer solr version for other reasons then I will
try your suggested option too!
(I will repost your solution to the user mailing list if that is ok for you
...)

yes, please do!


Where can I find more info about phrasequeries? I only found*
I mean, how does MultiPhraseQuery selects its documents for (tw:(abc a)
bc) ?

the multiphrasequery is just like a more general phrase query.

a phrase query for abc bc looks for abc in the document, followed by bc
a multiphrasequery for (abc a) bc looks for (abc OR a) in the
document, followed by bc.

this is also the same way synonyms work with phrase queries.
imagine you have a synonyms file that looks like this:
dog =  dog, dogs
food =  food, chow

then if a user types dog food, the resulting query is a
multiphrasequery of (dog dogs) (food chow)
this matches all 4 possibilities:
dog food
dogs food
dog chow
dogs chow

for more information, you can see the code to this query here:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/search/MultiPhraseQuery.java



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene project announcement

2010-11-18 Thread Peter Karich





 A) make a great Java to C# porting tool


db4o has already something like this, I think.
maybe the guys from db4o would give lucene committers a license?
or would it be bad to rely on none free tools like this?

or maybe there is already an equivalent tool?

Regards,
Peter.



I'm forced politely disagree with some of these thoughts, let me explain why:

I order for this technique to be successful it seems that there is as much work 
being poured into the porting technique as there is with the port itself. To my 
point of view it does seem like this is double the work for benefits that are 
perhaps not as good as they could be (I am not saying in any way that 
Lucene.NET today is not good, it is quite good and is the results of great 
efforts from a lot of very dedicated people) since it follows a java-style 
design which is great of the java world, but perhaps not always optimal for the 
C# world. The project should be doing one thing, either:
A) make a great Java to C# porting tool
B) make a great search engine in C#

As an example, it would be a hair-pulling experience to take Lucene.NET as it 
is today and use it on Microsoft Azure, an environment that is specifically 
designed for .NET applications.

As I said before, besides using Lucene.NET itself I haven't contributed much 
and only in discussions - I haven't committed any code. However I will say 
this: I personally don't know nor care about the Java language just as I'm sure 
many of you don't care about Prolog. In order to help out, I feel that I need 
to be able to read and understand the Lucene version in order to make the same 
stuff happen in the Lucene.NET version. This means I have to be both a Java and 
C# developer at the same time?

Mathematicians have been using math to explain algorithms for years, it is a 
universal language that is (to different levels) understood by all.

How those functional algorithms are implemented in a imperative language makes 
no difference, so long as they are implemented and produce the intended result.

I think that in the end, there should be at least 3 projects for Lucene:
1. The Lucene algorithms, in a platform-neutral language - let the search 
engine gurus implement how this should be done without having to worry about 
imperative programming and the hacks to get there - either a compiler or a 
manual model would be used to implement these algorithms
2. Lucene - Architecture of the project(s) - perhaps a lot of UML here in a 
format where it can be fed to quickly produce skeleton files
3.x. Lucene - language-specific versions

As Grant points out it is up to the community to make a decision, then let's 
all get together and see if collectively a decision can be made.

And for the record, I personally think that when an open source project has 3+ 
ports to the same language - there is a problem. What that problem is however, 
I won't venture in taking any guesses.

I make these comments for the good of the project(s) and it is in no way my 
intention to offend anyone and I salute all work and effort done thus far, we 
would not be here were it not for everyone involved.


Karell Ste-Marie
C.I.O. - BrainBank Inc

-Original Message-
From: Alex Thompson [mailto:pierogi...@hotmail.com]
Sent: Thursday, November 18, 2010 3:58 AM
To: lucene-net-...@lucene.apache.org
Subject: RE: Lucene project announcement

I don't think Lucene.Net staying a line-by-line port is craziness. We're not 
saying that Lucene.Net is the one true implementation and there can be no 
others. I see Lucene.Net as part of a spectrum of solutions.

On one end of the spectrum is IKVM. If you want all the java lucene features 
immediately and the constraints of IKVM work for your scenario then great, off 
you go.
Then there is Lucene.Net. This is good if IKVM doesn't work for you, you want 
short lag time behind java lucene (yes this needs improvement but we're working 
on it), and ability to read java lucene books/examples and apply that 
relatively seamlessly to your .NET code.
Then on the other end of the spectrum is the forks (wrapper/extension/refactor 
etc.) that try to make things ideal for the .NET world.

I think it's clear there is interest and support for both Lucene.Net and the 
forks. They should both exist and be complimentary, not competitive. The forks 
provide greater flexibility and greater exposure so more users and contributors 
can get involved. Lucene.Net provides the benefits listed above and provides an 
avenue for features to trickle down from java lucene to the forks.

So bottom line there is no one-size-fits-all implementation. Lucene.Net (as a 
line-by-line) provides good value to a significant user base and (assuming we 
can optimize the porting) takes relatively little effort, so it is a useful 
part of the spectrum.

Alex

-Original Message-
From: Andrew Busby [mailto:andrew.bu...@aimstrategic.com]
Sent: Wednesday, November 17, 2010 5:06 PM
To: lucene-net-...@lucene.apache.org
Subject:

WordDelimiterFilter bug

2010-11-18 Thread Peter Karich


 Hi,

I asked this on the user list and I think I found a bug in 
WordDelimiterFilterFactory
for splitOnCaseChange=1 catenateAll=0 preserveOriginal=1 (+ 
lowercase filter).

Add the following test* and append the definition to the schema.xml**
and it won't pass. Should I open a JIRA issue for this or isn't this a 
bug and I missed something?

(The strange thing is that the admin GUI will highlight it correctly)

Regards,
Peter.

BTW: I just read the code of SpellCheckCollator because it didn't 
compile. It is:

} catch (Exception e) {
  Log.warn(Exception trying to re-query to check if a spell 
check possibility would return any hits., e);

It should NOT use jetty Log - remove jetty dep
} catch (Exception e) {
  LOG.warn(Exception trying to re-query to check if a spell 
check possibility would return any hits., e);


*
  @Test
  public void testCaseChangeAndPreserve() {
assertU(adoc(id,  1,
 subword_cc, abcd));
assertU(adoc(id,  2,
 subword_cc, abCd.com));
assertU(commit());

assertQ(simple - case change and preserve,
req(subword_cc:(abcd))
,//resu...@numfound=1]
);
// returns at the moment only doc 2
// should also return doc1 because abCd should preserved + 
lowercase filter (for the query)

assertQ(camel case query - case change and preserve,
req(subword_cc:(abCd))
,//resu...@numfound=2]
);
// returns at the moment 0 docs
// should return doc2 because abCd.com should preserved + lowercase 
filter (for the index)

assertQ(camel case domain - case change and preserve,
req(subword_cc:(abcd.com))
,//resu...@numfound=1]
);
clearIndex();
  }

**
fieldtype name=subword_cc class=solr.TextField 
positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
catenateAll=0 preserveOriginal=1/

filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
catenateAll=0 preserveOriginal=1/

filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldtype

field name=subword_cc type=subword_cc indexed=true stored=true/



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-792) Pivot (ie: Decision Tree) Faceting Component

2010-11-08 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929590#action_12929590
]

Peter Karich commented on SOLR-792:
---

Hi Toke and all,

maybe I am a bit evil or stupid but could someone enlight me why this patch is
necessary?

Why can't you we the existing mechanisms in Solr (facets!) and a bit logic
while indexing:

http://markmail.org/message/2aza6nnsiw3l4bbb#query:+page:1+mid:3j3ttojacpjoyfg5+state:results

This has no performance problems when using tons of categories. We already
using it with lots of categories. It works out of the box with a nearly
infinity depth (either you need a DB -
unlimited or the URL length is the limit).

The only drawback of this approach is that you won't be able to display two or
more 'branches' at the same time. Only one current branch with the current
possible categories is possible, which is no limitation in our case. Because
the UI would be unusable if too many items would be visible at the same time.

One could introduce a special update component for this feature which uses a
category tree (in RAM) built from the json or xml definition. I could create
such a component if someone is interested.

Regards,
Peter.

Pivot (ie: Decision Tree) Faceting Component

Key: SOLR-792
URL: https://issues.apache.org/jira/browse/SOLR-792
Project: Solr
Issue Type: New Feature
Reporter: Erik Hatcher
Assignee: Yonik Seeley
Priority: Minor
Attachments: SOLR-792-as-helper-class.patch,
SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch,
SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch,
SOLR-792-raw-type.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch,
SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch

A component to do multi-level faceting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-11-08 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929760#action_12929760
]

Peter Karich commented on SOLR-1709:

Hi Peter Sturge,

what are the limitations of this patch? only that earlier + later isn't
supported?

What are the issues before commiting this into trunk?

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

2010-11-05 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928536#action_12928536
]

Peter Karich commented on SOLR-2218:

Lance, would you mind explaining this a bit in detail :-) ?

The idea is to grab all/alot documents from solr even if the dataset is very
large, if I haven't misunderstood what Bill was requesting. This is very useful
IMHO.

Performance of start= and rows= parameters are exponentially slow with large
data sets
--

Key: SOLR-2218
URL: https://issues.apache.org/jira/browse/SOLR-2218
Project: Solr
Issue Type: Improvement
Components: Build
Affects Versions: 1.4.1
Reporter: Bill Bell

With large data sets, 10M rows.
Setting start=large number and rows=large numbers is slow, and gets
slower the farther you get from start=0 with a complex query. Random also
makes this slower.
Would like to somehow make this performance faster for looping through large
data sets. It would be nice if we could pass a pointer to the result set to
loop, or support very large rows=number.
Something like:
rows=1000
start=0
spointer=string_my_query_1
Then within interval (like 5 mins) I can reference this loop:
Something like:
rows=1000
start=1000
spointer=string_my_query_1
What do you think? Since the data is too great the cache is not helping.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

fast bitset

2010-11-05 Thread Peter Karich


 Hi,

would this compressed and fast(?) bitset be interesting for solr/lucene 
or is openbitset already done this way?

quoting from github:

The goal of word-aligned compression is not to
achieve the best compression, but rather to
improve query processing time.

License is GPL version 3 and ASL2.0.

http://code.google.com/p/javaewah
https://github.com/lemire/javaewah

I just saw it on twitter ...

Regards,
Peter.

--
http://jetwick.com twitter search prototype


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: fast bitset

2010-11-05 Thread Peter Karich





 And they're not random-access capable.


which means it isn't applicable?



Important point about WAH and friends is their ability to be fast
and/or/not/xor'ed without full decompression. And they're not
random-access capable.

On Fri, Nov 5, 2010 at 18:47, Uwe Schindleru...@thetaphi.de  wrote:

Looks interesting, I was only annoyed when I saw new VectorInteger(),
which is synchronized, in the iterator code - which is the thing that is
most important for DocIdSets Looks like stone ages.

Else I would simply give it a try by rewriting the class to also implement
DocIdSet and return the optimized iterator (not the one in this class). You
can then try to replace some OpenBitSets in any filters and perf test?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de]
Sent: Friday, November 05, 2010 3:38 PM
To: dev@lucene.apache.org
Subject: fast bitset

   Hi,

would this compressed and fast(?) bitset be interesting for solr/lucene or

is

openbitset already done this way?
quoting from github:

The goal of word-aligned compression is not to achieve the best

compression,

but rather to improve query processing time.

License is GPL version 3 and ASL2.0.

http://code.google.com/p/javaewah
https://github.com/lemire/javaewah

I just saw it on twitter ...

Regards,
Peter.

--
http://jetwick.com twitter search prototype



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: question about inline function in java

2010-11-03 Thread Peter Karich


 Hi,

do not ever optimize premature in java (and other jit languages like 
javascript etc)!


if you have a bottleneck - optimize that. and only that. but take care 
how you compare statements.
be sure that you run some loops before. see the first comment of Aleksey 
Shipilev here:


http://karussell.wordpress.com/2009/05/21/microbenchmarking-java-compare-algorithms/

jit is very clever in optimizing code: so code as simple (and 'stupid') 
as possible to be understandle by jit ;-)

I.e. concentrate your time and effort on algorithms not on bytecode.

Regards,
Peter.


hi all
 we found function call in java will cost much time. e.g replacing
Math.min with ab?a:b will make it faster. Another example is lessThan
in PriorityQueue when use Collector to gather top K documents. Yes,
use function and subclass make it easy to maintain and extend. in
C/C++, we can use inline fuction to optimize. What about java? I see
many codes in lucene also inline many codes mannully.
such as implmented hash map in processDocument,
SegmentTermDocs.read  // manually inlined call to next() for speed.
Is there any compiler option for inline in java? Or we may hardcode
something for time consuming tasks

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1311) pseudo-field-collapsing

2010-10-15 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921324#action_12921324
 ] 

Peter Karich commented on SOLR-1311:


Hi Marc,

could this issue be closed because of a field collapsing which is now in trunk 
and more mature?

Why it cannot be integrated as a plugin?

 pseudo-field-collapsing
 ---

 Key: SOLR-1311
 URL: https://issues.apache.org/jira/browse/SOLR-1311
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Marc Sturlese
 Fix For: Next

 Attachments: SOLR-1311-pseudo-field-collapsing.patch


 I am trying to develope a new way of doing field collapsing based on the 
 adjacent field collapsing algorithm. I have started developing it beacuse I 
 am experiencing performance problems with the field collapsing patch with big 
 index (8G).
 The algorith does adjacent-pseudo-field collapsing. It does collapsing on the 
 first X documents. Instead of making the collapsed docs disapear, the 
 algorith will send them to a given position of the relevance results list.
 The reason I just do collapsing in the first X documents is that if I have 
 for example 60 results and I am showing 10 results per page, I really 
 don't need to do collapsing in the page 3 or even not in the 3000. Doing 
 this I am noticing dramatically better performance. The problem is I couldn't 
 find a way to plug the algorithm as a component and keep good performance. I 
 had to hack few classes in SolrIndexSearcher.java
 This patch is just experimental and for testing purposes. In case someone 
 finds it interesting would be good do find a way to integrate it in a better 
 way than it is at the moment.
 Advices are more than welcome.
   
 Functionality:
 In solrconfig.xml we specify the pseudo-collapsing parameters:
  str name=plus.considerMoreDocstrue/str
  str name=plus.considerHowMany3000/str
  str name=plus.considerFieldname/str
 (at the moment there's no threshold and other parameters that exist in the 
 current collapse-field patch)
 plus.considerMoreDocs one enables pseudo-collapsing
 plus.considerHowMany sets the number of resultant documents in wich we want 
 to apply the algorithm
 plus.considerField is the field to do pseudo-collapsing
 If the number of results is lower than plus.considerHowMany the algorithm 
 will be applyed to all the results.
 Let's say there is a query with 60 results and we've set considerHowMany 
 to 3000 (and we already have the docs sorted by relevance). 
 What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it 
 will be sent to the pos 2999 of the relevance results array. If the 3th has 
 to be collpased too  will go to the position 2998 and successively like this.
 The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs 
 is set to false. It neighter is applyed when using MoreLikeThisRequestHanlder.
 Example with a query of 9 results:
 Results sorted by relevance without pseudo-collapse-algorithm:
 doc1 - collapse_field_value 3
 doc2 - collapse_field_value 3
 doc3 - collapse_field_value 4
 doc4 - collapse_field_value 7
 doc5 - collapse_field_value 6
 doc6 - collapse_field_value 6
 doc7 - collapse_field_value 5
 doc8 - collapse_field_value 1
 doc9 - collapse_field_value 2
 Results pseudo-collapsed with plus.considerHowMany = 5
 doc1 - collapse_field_value 3
 doc3 - collapse_field_value 4
 doc4 - collapse_field_value 7
 doc5 - collapse_field_value 6
 doc2 - collapse_field_value 3*
 doc6 - collapse_field_value 6
 doc7 - collapse_field_value 5
 doc8 - collapse_field_value 1
 doc9 - collapse_field_value 2
 Results pseudo-collapsed with plus.considerHowMany = 9
 doc1 - collapse_field_value 3
 doc3 - collapse_field_value 4
 doc4 - collapse_field_value 7
 doc5 - collapse_field_value 6
 doc7 - collapse_field_value 5
 doc8 - collapse_field_value 1
 doc9 - collapse_field_value 2
 doc6 - collapse_field_value 6*
 doc2 - collapse_field_value 3*
 *pseudo-collapsed documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-64) strict hierarchical facets

2010-10-15 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921326#action_12921326
 ] 

Peter Karich commented on SOLR-64:
--

@SolrFan and @Mats:

you could try an alternative solution:
http://lucene.472066.n3.nabble.com/multi-level-faceting-tp1629650p1672083.html

 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: Next

 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
 SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-385) facet sorting with relevancy

2010-10-15 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921473#action_12921473
 ] 

Peter Karich commented on SOLR-385:
---

When I am thinking a bit more about this issue. For the 'ungeneralized version' 
- sorting against the maximum of the score (or any field?)- we can use the 
group-feature!

http://wiki.apache.org/solr/FieldCollapsing

The Solution - I think - would be the following request:

http://localhost:8983/solr/select/?q=hardgroup=truegroup.field=manu_exactgroup.limit=1debug=truefl=*,score

the collapse groups are ordered by the maxScore I think + hope ;-) 

So it is the same as we want:

http://localhost:8983/solr/select/?q=hardfacet=truefacet.field=manu_exactdebug=truefl=*,scorefacet.stats.sort=max(score)
 desc

Now one remaing task could be to extend this feature with max, min and mean 
functions ...


here is the 'group' result:

{code}
lst
str name=groupValueMaxtor Corp./str
−
result name=doclist numFound=1 start=0 maxScore=0.70904505
−
doc
float name=score0.70904505/float
−
arr name=cat
strelectronics/str
strhard drive/str
/arr
−
arr name=features
strSATA 3.0Gb/s, NCQ/str
str8.5ms seek/str
str16MB cache/str
/arr
str name=id6H500F0/str
bool name=inStocktrue/bool
str name=manuMaxtor Corp./str
date name=manufacturedate_dt2006-02-13T15:26:37Z/date
−
str name=name
Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300
/str
int name=popularity6/int
float name=price350.0/float
str name=store45.17614,-93.87341/str
/doc
/result
/lst
−
lst
str name=groupValueSamsung Electronics Co. Ltd./str
−
result name=doclist numFound=1 start=0 maxScore=0.5908709
−
doc
float name=score0.5908709/float
−
arr name=cat
strelectronics/str
strhard drive/str
/arr
−
arr name=features
str7200RPM, 8MB cache, IDE Ultra ATA-133/str
−
str
NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor
/str
/arr
str name=idSP2514N/str
bool name=inStocktrue/bool
str name=manuSamsung Electronics Co. Ltd./str
date name=manufacturedate_dt2006-02-13T15:26:37Z/date
−
str name=name
Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133
/str
int name=popularity6/int
float name=price92.0/float
str name=store45.17614,-93.87341/str
/doc
/result
/lst
{code}

this would be the faceting result:

{code}
lst name=facet_fields
lst name=manu_exact
int name=Maxtor Corp. score=0.709045051/int
int name=Samsung Electronics Co. Ltd. score=0.59087091/int
...
{code}

 facet sorting with relevancy
 

 Key: SOLR-385
 URL: https://issues.apache.org/jira/browse/SOLR-385
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Dmitry Degtyarev
Priority: Minor

 Sometimes facet sort based only on the count of matches is not relevant, I 
 need to sort not only by the count of matches, but also on the scores of 
 matches.
 In the most simple way it must sort categories by the sum of item scores that 
 matches query and the category. In the best way there should be some 
 coefficient to multiply Scores or some function.
 Is it possible to implement such a behavior for facet sort?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene powers Twitter's search

2010-10-10 Thread Peter Karich

Shai,

thanks a lot for this information. This is nice and bad at the same time :-)

Nice, that they will contribute their changes back to the community and
bad, because now the core search technology (lucene) of jetwick and
twitter are
nearly the same and this is 'fight' like david vs. goliath ...

But probably me or someone else come up with a killer feature ;-)

Regards,
Peter.

 Oops, forgot to paste the link from which the quote was taken:

 http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html

 Shai

 On Thu, Oct 7, 2010 at 8:46 AM, Shai Erera ser...@gmail.com wrote:

   
 I came across this post today:
 http://techcrunch.com/2010/10/06/new-twitter-search

 And continued reading these:
 http://techcrunch.com/2010/10/06/twitter-search-lives/

 From the latter:

 Modified Lucene

 Lucene is great, but in its current form it has several shortcomings for
 real-time search. That’s why we rewrote big parts of the
 core in-memory data structures, especially the posting lists, while still
 supporting Lucene’s standard APIs. This allows us to
 use Lucene’s search layer almost unmodified. Some of the highlights of our
 changes include:

- significantly improved garbage collection performance
- lock-free data structures and algorithms
- posting lists, that are traversable in reverse order
- efficient early query termination

 We believe that the architecture behind these changes involves several
 interesting topics that pertain to software engineering in
 general (not only search). We hope to continue to share more on these
 improvements.

 And, before you ask, we’re planning on contributing all these changes back
 to Lucene; some of which have already made it into
 Lucene’s trunk and its new realtime branch.


 And as you can read at the bottom of the last post, Michael B. is behind
 all this :-).

 FYI
 Shai

-- 
http://jetwick.com twitter search prototype


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902588#action_12902588
]

Peter Karich commented on SOLR-2059:

Robert,

thanks for this work! I have a different application for this patch: in a
twitter search # and @ shouldn't be removed. Instead I will handle them like
ALPHA, I think.

Would you mind to update the patch for the latest version of the trunk? I got a
problem with WordDelimiterIterator at line 254 if I am using
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr and a file is missing
problem (line 37) for http://svn.apache.org/repos/asf/solr

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

By default, WordDelimiterFilter assigns 'types' to each character (computed
from Unicode Properties).
Based on these types and the options provided, it splits and concatenates
text.
In some circumstances, you might need to tweak the behavior of how this works.
It seems the filter already had this in mind, since you can pass in a custom
byte[] type table.
But its not exposed in the factory.
I think you should be able to customize the defaults with a configuration
file:
{noformat}
# A customized type mapping for WordDelimiterFilterFactory
# the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM
#
# the default for any character without a mapping is always computed from
# Unicode character properties
# Map the $, %, '.', and ',' characters to DIGIT
# This might be useful for financial data.
$ = DIGIT
% = DIGIT
. = DIGIT
\u002C = DIGIT
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902600#action_12902600
]

Peter Karich commented on SOLR-2059:

Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:

@ = ALPHA
# = ALPHA

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902600#action_12902600
]

Peter Karich edited comment on SOLR-2059 at 8/25/10 3:46 PM:
-

Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:
{code}
@ = ALPHA
# = ALPHA
{code}

was (Author: peathal):
Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:

@ = ALPHA
# = ALPHA

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[no subject]

2010-08-25 Thread Peter Karich



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2005) NullPointerException for more like this request handler via SolrJ if the document does not exist

2010-07-19 Thread Peter Karich (JIRA)

NullPointerException for more like this request handler via SolrJ if the 
document does not exist


 Key: SOLR-2005
 URL: https://issues.apache.org/jira/browse/SOLR-2005
 Project: Solr
  Issue Type: Bug
  Components: clients - java, MoreLikeThis
Affects Versions: 1.4
 Environment: jdk1.6
Reporter: Peter Karich


If I query solr with the following (via SolrJ):

q=myUniqueKey%3AsomeValueWhichDoesNotExistqt=%2Fmltmlt.fl=myMLTFieldmlt.minwl=2mlt.mindf=1mlt.match.include=falsefacet=truefacet.sort=countfacet.mincount=1facet.limit=10facet.field=differentFacetFieldstart=0rows=10

I get:

org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
Caused by: java.lang.NullPointerException
at 
org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:180)
at 
org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:103)
at 
org.apache.solr.client.solrj.response.QueryResponse.init(QueryResponse.java:80)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)

The xml response of the url is empty and so the info variable at line

NamedListInteger fq = (NamedListInteger) info.get( facet_queries );

(QueryResponse) is null. Maybe all variables at QueryResponse.setResponse 
should be checked against null? Sth. like

val = res.getVal( i );
if(val == null) continue; 

?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2005) NullPointerException for more like this request handler via SolrJ if the document does not exist

2010-07-19 Thread Peter Karich (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Karich updated SOLR-2005:
---

Priority: Minor  (was: Major)

 NullPointerException for more like this request handler via SolrJ if the 
 document does not exist
 

 Key: SOLR-2005
 URL: https://issues.apache.org/jira/browse/SOLR-2005
 Project: Solr
  Issue Type: Bug
  Components: clients - java, MoreLikeThis
Affects Versions: 1.4
 Environment: jdk1.6
Reporter: Peter Karich
Priority: Minor
   Original Estimate: 0.33h
  Remaining Estimate: 0.33h

 If I query solr with the following (via SolrJ):
 q=myUniqueKey%3AsomeValueWhichDoesNotExistqt=%2Fmltmlt.fl=myMLTFieldmlt.minwl=2mlt.mindf=1mlt.match.include=falsefacet=truefacet.sort=countfacet.mincount=1facet.limit=10facet.field=differentFacetFieldstart=0rows=10
 I get:
 org.apache.solr.client.solrj.SolrServerException: Error executing query
 at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:180)
 at 
 org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:103)
 at 
 org.apache.solr.client.solrj.response.QueryResponse.init(QueryResponse.java:80)
 at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
 The xml response of the url is empty and so the info variable at line
 NamedListInteger fq = (NamedListInteger) info.get( facet_queries );
 (QueryResponse) is null. Maybe all variables at QueryResponse.setResponse 
 should be checked against null? Sth. like
 val = res.getVal( i );
 if(val == null) continue; 
 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-787) SolrJ POM refers to stax parser

2010-06-13 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878385#action_12878385
 ] 

Peter Karich commented on SOLR-787:
---

Is this really correctly fixed? Inspecting my deps with NetBeans' maven dep 
viewer I don't understand why Solr uses woodstox and SolrJ uses the different 
artifact (but same jar) org.codehaus.woodstox

And according to 

http://jarvana.com/jarvana/inspect-pom/org/apache/solr/solr-core/1.4.0/solr-core-1.4.0.pom

http://jarvana.com/jarvana/inspect-pom/org/apache/solr/solr-solrj/1.4.0/solr-solrj-1.4.0.pom

NetBeans is correct.

The problem with this is, that you will have two identical jars in the 
classpath and that the solrj dep forces you to still use stax-api

 SolrJ POM refers to stax parser
 ---

 Key: SOLR-787
 URL: https://issues.apache.org/jira/browse/SOLR-787
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-787.patch


 Solr core moved to using woodstox instead of stax but SolrJ POM still has a 
 dependency to stax. We should replace the dependency to stax with woodstox 
 jar in SolrJ's POM.
 This is not a huge problem as we are not distributing stax anymore but is 
 needed for consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-1950) SolrJ POM still refers to stax parser

2010-06-13 Thread Peter Karich (JIRA)

SolrJ POM still refers to stax parser
-

 Key: SOLR-1950
 URL: https://issues.apache.org/jira/browse/SOLR-1950
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4
Reporter: Peter Karich
Priority: Minor


See the issue at https://issues.apache.org/jira/browse/SOLR-787 which seems to 
be incorrectly fixed. (I cannot reopen that issue, so I create this one here)

Using the following deps:
   dependency
groupIdorg.apache.solr/groupId
artifactIdsolr-solrj/artifactId
version1.4.0/version
/dependency
dependency
artifactIdsolr-core/artifactId
groupIdorg.apache.solr/groupId
version1.4.0/version
/dependency

will lead to duplicate jars. (Solr uses woodstox and SolrJ uses the different 
artifact (but same jar) org.codehaus.woodstox )

But maybe the artifacts are only incorrectly deployed? Where can I find the 
original pom files?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1864) Master/Slave replication causes tomcat to be unresponsive on slave till replication is being done.

2010-06-01 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874055#action_12874055
 ] 

Peter Karich commented on SOLR-1864:


This might be a duplicate of: https://issues.apache.org/jira/browse/SOLR-1775

The reason might be (as Paul Noble noted) that the garbage collector is busy a 
lot because of autowarm up after index switch was done

 Master/Slave replication causes tomcat to be unresponsive on slave till 
 replication is being done.
 --

 Key: SOLR-1864
 URL: https://issues.apache.org/jira/browse/SOLR-1864
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.5
 Environment: Centos 5.2, Tomcat5, java version 1.6.0
 OpenJDK  Runtime Environment (build 1.6.0-b09)
 OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)
Reporter: Marcin

 Hi guys,
 I have found a strange behaviour on tomcat5, centos 5.2.
 While replication is being done ( million rows) tomcat5 seems to be 
 unresponsive till its finished.
 Please help
 cheers,
 /Marcin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-236) Field collapsing

2010-03-05 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841752#action_12841752
 ] 

Peter Karich commented on SOLR-236:
---

 Shouldn't the float array in DocSetScoreCollector be changed to a Map?

hmmh, maybe I expressed myself a bit weird: I already changed this all to a Map 
(a SortedMap) ... 
I started this change in DocSetScoreCollector and changed all the other 
occurances of the float array (otherwise I would have to copy the entire map)

  I think the compare method should NOT be called if no docs are in the 
  scores array ... ?

 I would expect that every docId has a score.

Yes, me too. So I expect there is somewhere a bug. But as I sayd this breaks 
only one test (collapse with faceting before). It could be even a but in the 
testcase though.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-236) Field collapsing

2010-03-05 Thread Peter Karich (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Karich updated SOLR-236:
--

Attachment: NonAdjacentDocumentCollapserTest.java
NonAdjacentDocumentCollapser.java
DocSetScoreCollector.java

It seems to me that the provides changes are necessary to make the OutOfMemory 
exception gone. Please apply the files with caution, because I made the changes 
from an old patch (from Nov 2009)

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-05 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841756#action_12841756
 ] 

Peter Karich edited comment on SOLR-236 at 3/5/10 8:53 AM:
---

It seems to me that the provided changes are necessary to make the OutOfMemory 
exception gone (see appended 3 files). Please apply the files with caution, 
because I made the changes from an old patch (from Nov 2009)

  was (Author: peathal):
It seems to me that the provides changes are necessary to make the 
OutOfMemory exception gone. Please apply the files with caution, because I made 
the changes from an old patch (from Nov 2009)
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-03-04 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147
 ] 

Peter Karich commented on SOLR-236:
---

regarding the OutOfMemory problem: we are now testing the suggested change in 
production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat} 
public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 

if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;

{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-04 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147
 ] 

Peter Karich edited comment on SOLR-236 at 3/4/10 9:48 AM:
---

regarding the OutOfMemory problem: we are now testing the suggested change in 
production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

I think the compare method should NOT be called if no docs are in the scores 
array ... ?

  was (Author: peathal):
regarding the OutOfMemory problem: we are now testing the suggested change 
in production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-04 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147
 ] 

Peter Karich edited comment on SOLR-236 at 3/4/10 9:46 AM:
---

regarding the OutOfMemory problem: we are now testing the suggested change in 
production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?

  was (Author: peathal):
regarding the OutOfMemory problem: we are now testing the suggested change 
in production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat} 
public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 

if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;

{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry

[jira] Commented: (SOLR-1167) Support module xml config files using XInclude

2010-03-04 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841178#action_12841178
 ] 

Peter Karich commented on SOLR-1167:


@Shalin Shekhar Mangar: how can I use the proposed attribute feature to be used 
for master+slave configuration? Do you have a code snippet?

 Support module xml config files using XInclude
 --

 Key: SOLR-1167
 URL: https://issues.apache.org/jira/browse/SOLR-1167
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Bryan Talbot
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1167.patch, SOLR-1167.patch, SOLR-1167.patch, 
 SOLR-1167.patch, SOLR-1167.patch


 Current configuration files (schema and solrconfig) are monolithic which can 
 make maintenance and reuse more difficult that it needs to be.  The XML 
 standards include a feature to include content from external files.  This is 
 described at http://www.w3.org/TR/xinclude/
 This feature is to add support for XInclude features for XML configuration 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835230#action_12835230
 ] 

Peter Karich commented on SOLR-236:
---

We are facing OutOfMemory problems too. We are using 
https://issues.apache.org/jira/secure/attachment/12425775/field-collapse-5.patch

 Are you using any other features besides plain collapsing? The field collapse 
 cache gets large very quickly,
 I suggest you turn it off (if you are using it). Also you can try to make 
 your filterCache smaller.

How can I turn off the collapse cache or make the filterCache smaller?
Are there other workarounds? E.g. via using a special version of the patch ?

I read that it could help to specify collapse.maxdocs but this didn't help in 
our case ... could collapse.type=adjacent help here?  
(https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12495376page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12495376)

What do you think?

BTW: We really like this patch and would like to use it !! :-)

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich commented on SOLR-236:
---

Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from 
nightly build but does not work. If I query 

http://searchdev05:15100/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM:


Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from 
nightly build but does not work. If I query 

http://server/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://searchdev05:15100/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM:


Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from 
nightly build but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://server/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:07 PM:


Trying the latest patch from 1th Feb 2010. It compiles against solr-2010-02-13 
from nightly build dir, but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max

47 matches

Mail list logo