[jira] Updated: (SOLR-486) Support binary formats for QueryresponseWriter

2008-10-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-486:


Attachment: optimizemap.patch

Just the way NamedList keys can be externalized, Map keys can also be 
externalized.And this is backward compatible.

Maps are not used very commonly in SOLR. but SOLR-561 uses maps for 
master-slave communication



 Support binary formats for QueryresponseWriter
 --

 Key: SOLR-486
 URL: https://issues.apache.org/jira/browse/SOLR-486
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, search
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 1.3

 Attachments: optimizemap.patch, SOLR-486-iterator.patch, 
 SOLR-486-iterator.patch, SOLR-486.patch, solr-486.patch, SOLR-486.patch, 
 SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
 SOLR-486.patch, SOLR-486.patch, SOLR-486.patch


 QueryResponse writer only allows text data to be written.
 So it is not possible to implement a binary protocol . Create another 
 interface which has a method 
 write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-09 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638217#action_12638217
 ] 

Andrzej Bialecki  commented on SOLR-799:


+1 on the incremental sig calculation.

Re: different types of signatures. Our experience in Nutch is that signature 
type is rarely changed, and we assume that this setting is selected once per 
lifetime of an index, i.e. there are never any mixed cases of documents with 
incompatible signatures. If we want to be sure that they are comparable, we 
could prepend a byte or two of unique signature type id - this way, even if a 
signature value matches but was calculated using other impl. the documents 
won't be considered duplicates, which is the way it should work, because 
different signature algorithms are incomparable.

Re: signature as byte[] - I think it's better if we return byte[] from 
Signature, and until we support binary fields we just turn this into a hex 
string.

Re: field ordering in DeduplicateUpdateProcessorFactory: I think that both 
sigFields (if defined) and any other document fields (if sigFields is 
undefined) should be first ordered in a predictable way (lexicographic?). 
Current patch uses a HashSet which doesn't guarantee any particular ordering - 
in fact the ordering may be different if you run the same code under different 
JVMs, which may introduce a random factor to the sig. calculation.

 Add support for hash based exact/near duplicate document handling
 -

 Key: SOLR-799
 URL: https://issues.apache.org/jira/browse/SOLR-799
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-799.patch


 Hash based duplicate document detection is efficient and allows for blocking 
 as well as field collapsing. Lets put it into solr. 
 http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-84) New Solr logo?

2008-10-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
. Is it a strong requirment to have apache in the logo?. I didn't know that


On Wed, Oct 8, 2008 at 10:11 PM, Lukáš Vlček [EMAIL PROTECTED] wrote:
 Hi,
 I am glad you like the draft#1 (and actually I think the second design is
 not totally lost, just wipe out the Apache letters and you get it). But the
 problem is that the draft#1 (as it is today) would not make it into the
 contest due to violation of the strongest requirement:

 The logo must incorporate the full project name: Apache Solr

 That is the assigment (http://wiki.apache.org/solr/LogoContest).
 You can try to push the contest organizers, not me...

 If you were to ask me if I like the fact that the Apache word has to be
 incorporated then I would tell you that I not happy about it (but this
 should not mean that I think that one can not create a perfect design with
 the Apache word). The problem I see with this is that there are no official
 rules how the Apache word can be used in designs (which type of font, which
 color...). Current mix of fonts in my second proposal is not ideal but I am
 scared to use any exotic font on Apache because people are used to see
 something like Arial Bold and in the end of the day having too exotic design
 of Apache could be seen as an disadvantage.

 Regards,
 Lukas

 On Wed, Oct 8, 2008 at 6:17 PM, Noble Paul നോബിള്‍ नोब्ळ् 
 [EMAIL PROTECTED] wrote:

 Adding apache just adds to the no:of letters . So the logo was a bit
 big.  draft1 is cool
 --Noble

 On Wed, Oct 8, 2008 at 2:01 PM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
  I think what Noble meant to say is that the Apache lying below Solr does
 not
  looking very good. Perhaps we can shift Apache either left or upwards of
  Solr?
 
  On Wed, Oct 8, 2008 at 10:36 AM, Lukáš Vlček [EMAIL PROTECTED]
 wrote:
 
  It seems so, according to official requiremetns:
  http://wiki.apache.org/solr/LogoContest
 
  On Wed, Oct 8, 2008 at 6:44 AM, Noble Paul നോബിള്‍ नोब्ळ् 
  [EMAIL PROTECTED] wrote:
 
   do we really need the APACHE under the solr logo? the other one looks
  clean
  
   On Wed, Oct 8, 2008 at 4:22 AM, Lukas Vlcek (JIRA) [EMAIL PROTECTED]
   wrote:
   
[
  
 
 https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
   
Lukas Vlcek updated SOLR-84:

   
   Attachment: solr_logo_it_is_burning.png
   
It is burning! ... Apache Solr Logo contest submition (based on my
   previous draft http://picasaweb.google.cz/lukas.vlcek/Solr)
   
New Solr logo?
--
   
Key: SOLR-84
URL: https://issues.apache.org/jira/browse/SOLR-84
Project: Solr
 Issue Type: Improvement
   Reporter: Bertrand Delacretaz
   Priority: Minor
Attachments: logo-grid.jpg, logo-solr-d.jpg,
 logo-solr-e.jpg,
   logo-solr-source-files-take2.zip, solr-84-source-files.zip,
 solr-f.jpg,
   solr-logo-20061214.jpg, solr-logo-20061218.JPG,
 solr-logo-20070124.JPG,
   solr-nick.gif, solr.jpg, solr.s1.jpg, solr.svg,
  solr_logo_it_is_burning.png,
   sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg,
   sslogo-solr2.jpg, sslogo-solr3.jpg
   
   
Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put
 at
   here) sarraux-dessous.ch) has reworked his logo proposal to be more
  solar.
This can either be the start of a logo contest, or if people like
 it
  we
   could adopt it. The gradients can make it a bit hard to integrate, not
  sure
   if this is really a problem.
WDYT?
   
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
   
   
  
  
  
   --
   --Noble Paul
  
 
 
 
  --
  http://blog.lukas-vlcek.com/
 
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



 --
 --Noble Paul




 --
 http://blog.lukas-vlcek.com/




-- 
--Noble Paul


[jira] Commented: (SOLR-806) improve m2-deploy tasks authentication support

2008-10-09 Thread Stefan Oestreicher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638248#action_12638248
 ] 

Stefan Oestreicher commented on SOLR-806:
-

Well ... I don't :) When I opened this issue I wasn't aware that snapshots are 
available. I guess this issue can be closed then. Sorry.

 improve m2-deploy tasks authentication support
 --

 Key: SOLR-806
 URL: https://issues.apache.org/jira/browse/SOLR-806
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Stefan Oestreicher
Priority: Trivial

 The m2-deploy task uses the authentication element with the username and 
 privateKey attribute to set the user credentials. Unfortunately the 
 privateKey attribute is only applicable for ssh connections. 
 Quote from http://maven.apache.org/ant-tasks.html:
 bq. It accepts the attributes username, password, and for SSH based 
 repositories privateKey and passphrase.
 Therefore authentication fails for non-ssh connections. I worked around that 
 by using the password attribute instead of privateKey. However I'd prefer 
 not having to modify the build file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-806) improve m2-deploy tasks authentication support

2008-10-09 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638244#action_12638244
 ] 

Shalin Shekhar Mangar commented on SOLR-806:


Stefan -- Just curious to know, now that Solr's artifacts are available in the 
official Maven repositories and mirrors, why do you need to use m2-deploy tasks 
for Solr?

 improve m2-deploy tasks authentication support
 --

 Key: SOLR-806
 URL: https://issues.apache.org/jira/browse/SOLR-806
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Stefan Oestreicher
Priority: Trivial

 The m2-deploy task uses the authentication element with the username and 
 privateKey attribute to set the user credentials. Unfortunately the 
 privateKey attribute is only applicable for ssh connections. 
 Quote from http://maven.apache.org/ant-tasks.html:
 bq. It accepts the attributes username, password, and for SSH based 
 repositories privateKey and passphrase.
 Therefore authentication fails for non-ssh connections. I worked around that 
 by using the password attribute instead of privateKey. However I'd prefer 
 not having to modify the build file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-806) improve m2-deploy tasks authentication support

2008-10-09 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-806.


Resolution: Won't Fix

Closing the issue then :)

 improve m2-deploy tasks authentication support
 --

 Key: SOLR-806
 URL: https://issues.apache.org/jira/browse/SOLR-806
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Stefan Oestreicher
Priority: Trivial

 The m2-deploy task uses the authentication element with the username and 
 privateKey attribute to set the user credentials. Unfortunately the 
 privateKey attribute is only applicable for ssh connections. 
 Quote from http://maven.apache.org/ant-tasks.html:
 bq. It accepts the attributes username, password, and for SSH based 
 repositories privateKey and passphrase.
 Therefore authentication fails for non-ssh connections. I worked around that 
 by using the password attribute instead of privateKey. However I'd prefer 
 not having to modify the build file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-806) improve m2-deploy tasks authentication support

2008-10-09 Thread Stefan Oestreicher (JIRA)
improve m2-deploy tasks authentication support
--

 Key: SOLR-806
 URL: https://issues.apache.org/jira/browse/SOLR-806
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Stefan Oestreicher
Priority: Trivial


The m2-deploy task uses the authentication element with the username and 
privateKey attribute to set the user credentials. Unfortunately the privateKey 
attribute is only applicable for ssh connections. 

Quote from http://maven.apache.org/ant-tasks.html:
bq. It accepts the attributes username, password, and for SSH based 
repositories privateKey and passphrase.

Therefore authentication fails for non-ssh connections. I worked around that by 
using the password attribute instead of privateKey. However I'd prefer not 
having to modify the build file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-10-09 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638355#action_12638355
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

What's a hard drive sort?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-10-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638359#action_12638359
 ] 

Mark Miller commented on SOLR-236:
--

bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external 
storage (harddrive). I am only just looking into this stuff myself, but it 
seems in the best case you would want to do it in memory with a hash system 
which can be linear scalability. If you have too many items to look for dupes 
in, you have to use external storage - one good method is two external sorts 
(we get one from the search), but there are other options too I think.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-10-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638359#action_12638359
 ] 

[EMAIL PROTECTED] edited comment on SOLR-236 at 10/9/08 12:53 PM:


bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external 
storage (harddrive). I am only just looking into this stuff myself, but it 
seems in the best case you would want to do it in memory with a hash system 
which can be linear scalability. If you have too many items to look for dupes 
in, you have to use external storage - one good method is two sorts (we get one 
from the search), but there are other options too I think. In this case, the 
sorts are able to be done in memory though, but I think the hashtable method of 
identifying dupes is much less memory efficient (too many unique terms).

  was (Author: [EMAIL PROTECTED]):
bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external 
storage (harddrive). I am only just looking into this stuff myself, but it 
seems in the best case you would want to do it in memory with a hash system 
which can be linear scalability. If you have too many items to look for dupes 
in, you have to use external storage - one good method is two external sorts 
(we get one from the search), but there are other options too I think.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-84) Logo Contests

2008-10-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-84:
-

Description: 
This issue was original a scratch pad for various ideas for new Logos.  It is 
now being used as a repository for submissions for the Solr Logo Contest...

   http://wiki.apache.org/solr/LogoContest

Note that many of the images currently attached are not eligible for the 
contest since they do not meet the official guidelines for new Apache project 
logos (in particular that the full project name Apache Solr must be included 
in the Logo).  Only eligible attachments will be included in the official 
voting.


  was:
Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
sarraux-dessous.ch) has reworked his logo proposal to be more solar.

This can either be the start of a logo contest, or if people like it we could 
adopt it. The gradients can make it a bit hard to integrate, not sure if this 
is really a problem.

WDYT?

Summary: Logo Contests  (was: New Solr logo?)

Updating issue summary and description to explicitly refer to Logo Contest.

For posterity: bdelacretaz credited Nicolas Barbay (nicolas (put at here) 
sarraux-dessous.ch) as the creator of the logo's he attached when initially 
creating this issue.


 Logo Contests
 -

 Key: SOLR-84
 URL: https://issues.apache.org/jira/browse/SOLR-84
 Project: Solr
  Issue Type: Improvement
Reporter: Bertrand Delacretaz
Priority: Minor
 Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, 
 logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg, 
 solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, 
 solr-nick.gif, solr.jpg, solr.s1.jpg, solr.svg, solr_logo_it_is_burning.png, 
 sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg, 
 sslogo-solr2.jpg, sslogo-solr3.jpg


 This issue was original a scratch pad for various ideas for new Logos.  It is 
 now being used as a repository for submissions for the Solr Logo Contest...
http://wiki.apache.org/solr/LogoContest
 Note that many of the images currently attached are not eligible for the 
 contest since they do not meet the official guidelines for new Apache project 
 logos (in particular that the full project name Apache Solr must be 
 included in the Logo).  Only eligible attachments will be included in the 
 official voting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: QueryParsing using SolrCore.getSolrCore()

2008-10-09 Thread Chris Hostetter

:   public static FunctionQuery parseFunction(String func, IndexSchema schema)
: throws ParseException {
: SolrCore core = SolrCore.getSolrCore();
: return (FunctionQuery)(QParser.getParser(func,func,new
: LocalSolrQueryRequest(core,new HashMap())).parse());
: // return new FunctionQuery(parseValSource(new StrParser(func), schema));
:   }

Ugh.  I don't think there's any easy way to fix that so it satisfies every 
imaginable usecase.

The sanest thing to do would probably be to make parseFunction construct a 
local instance of FunctionQParser (with a null SolrQueryRequest) instead 
of using QParser.getParser(...).  that should be fairly close to the way 
it worked in older versions

(hmmm... except FunctionQParser does something similar to get a 
ValueSourceParser ... a new constructor for FunctionQParser that 
explicitly tells it to use ValueSourceParser.standardValueSourceParsers 
might be in order).

I think it goes without saying that QueryParsing.parseFunction should be 
deprecated as well ... fortunately it's only used in a few places in the 
core code ... unfortunately those places also don't currently have access 
to a SolrQueryRequest at the moment: 
  1) SolrPluginUtils.parseFuncs -- should probably be deprecated, callers 
of it should start using the QParser APIs.
  2) SolrQueryParser.getFieldQuery -- it's only used if the 
SolrQueryParser was constructed with an IndexSchema and not with a QParser 
(in which case it can ask the QParser for a subParser) .. the IndexSchema 
constructor should probably be deprecated as well (but i haven't dug in to 
see how far down the rabit hole that change would go)


-Hoss



[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638426#action_12638426
 ] 

Hoss Man commented on SOLR-799:
---

(disclaimer: haven't looked at the patch)

bq. Though in some implementations (like #2, which may be the default), 
detecting that duplicate and handling it are truly coupled... forcing a 
decoupling would not be a good thing in that case.

I don't follow your reasoning.  all the usecases i've seen mentioned seem like 
they could/would decouple very nicely...

1. Prevent new insert -- SignatureUpdateProcessor generates a signature and 
adds it as a field; AbortIfExistingUpdateProcessor aborts the update if a doc 
exists with a specific field in common with the doc to be added.
2. Remove old (i.e. same as an update works now) -- SignatureUpdateProcessor as 
mentioned before, and signature field is used as the uniqueKey field.
3. Note the duplicate on the existing document in a duplicates field -- 
SignatureUpdateProcessor as mentioned before; AnnotateDuplicatesProcessor 
checks for existing docs with a specific field in common with the doc to be 
added and executes additional opperations to udpate those docs, as well as 
the doc to be added.


 Add support for hash based exact/near duplicate document handling
 -

 Key: SOLR-799
 URL: https://issues.apache.org/jira/browse/SOLR-799
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-799.patch


 Hash based duplicate document detection is efficient and allows for blocking 
 as well as field collapsing. Lets put it into solr. 
 http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638427#action_12638427
 ] 

Hoss Man commented on SOLR-799:
---

some misc comments from a user perspective based on the current state of the 
wiki...

1) rather then a comma seperated str fields, we should just use an arr

2) we should consider if/how we want to support using dynamicFields (ie: field 
name globs) in listing fields that are included in the signature)

3) By default, all non null fields on the document will be used. ... there's 
no such thing as a null field -- there are fields that have no value, and there 
are fields whose value is an empty string, but no null value.

4) yonik already asked other questions i had based on the wiki: how the order 
of fields in the update command affects the signature that gets computed -- 
both in terms of fields with different names, and fields with the same name.  
the fields should probably be stable sorted by field name, so that the order of 
fields with teh same name affects the signature, but the relative order of 
fields with different names doesn't (since the order of fields with the same 
name actually affects the way the document is indexed, but the order of 
different field names does not)

 Add support for hash based exact/near duplicate document handling
 -

 Key: SOLR-799
 URL: https://issues.apache.org/jira/browse/SOLR-799
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-799.patch


 Hash based duplicate document detection is efficient and allows for blocking 
 as well as field collapsing. Lets put it into solr. 
 http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-801) configurable delete query per root-entity

2008-10-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638441#action_12638441
 ] 

Hoss Man commented on SOLR-801:
---

perhaps a preFullBuildDeleteQuery and a postFullBuildDeleteQuery ... somepeople 
might want to delete all existing entities before starting a full build -- 
others might want to wait until after the full build to delete old stuff 
(anything that hasn't been updated in since NOW-1DAY for example)

 configurable delete query  per root-entity
 --

 Key: SOLR-801
 URL: https://issues.apache.org/jira/browse/SOLR-801
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Noble Paul
 Fix For: 1.4


 Now on a full import DIH deletes *:* which is a problem if I have different 
 root entities .
 Suggest having a deleteByQuery attribute for each root entity so that cleanup 
 can issue a different query instead of *:*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-801) configurable delete query per root-entity

2008-10-09 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638477#action_12638477
 ] 

Noble Paul commented on SOLR-801:
-

OK .makes sense .Let us make it
preImportDeleteQuery and 
postImportDeleteQuery

delta import also use the same query. By default delta-import does not do 
delete (unless clean=true is passed in request param)

 configurable delete query  per root-entity
 --

 Key: SOLR-801
 URL: https://issues.apache.org/jira/browse/SOLR-801
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Noble Paul
 Fix For: 1.4


 Now on a full import DIH deletes *:* which is a problem if I have different 
 root entities .
 Suggest having a deleteByQuery attribute for each root entity so that cleanup 
 can issue a different query instead of *:*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-782) cleanup DIH code

2008-10-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-782:


Attachment: SOLR-782.patch

 cleanup DIH code
 

 Key: SOLR-782
 URL: https://issues.apache.org/jira/browse/SOLR-782
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-782.patch


 A lot of unnecessary code was introduced in DIH to make it work both 1.2 and 
 1.3. Now 1.3 is out we can clean it up

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.