RE: duplicate records in index

2011-02-16 Thread Digy
You are adding the same doc twice.
(See how you add acttime )

DIGY

-Original Message-
From: Wen Gao [mailto:samuel.gao...@gmail.com] 
Sent: Wednesday, February 16, 2011 11:35 AM
To: lucene-net-dev@lucene.apache.org
Subject: duplicate records in index

Hi,

I am creating an index from my database, however, the record in .cfs files
contains duplicate records,  e.g.

book1, 1, susan, 1

book1, 1,susan,1, 03/01/2010

book2, 2,tom,

book2,2,tom, 2,03/02/2010

..

 

I got the data from several tables, and am sure that the sql only generate
one record. Also, when I debug the code, the record is only added once.

So I am confused whether data replicate in idex.

 

I define my index as following format:



doc.Add(new Lucene.Net.Documents.Field(

lmname,

readerreader1[lmname].ToString(),

//new
System.IO.StringReader(readerreader[cname].ToString()),

Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.TOKENIZED)

 

 

);

//lmid

doc.Add(new Lucene.Net.Documents.Field(

lmid,

readerreader1[lmid].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

 

// nick name of user

doc.Add(new Lucene.Net.Documents.Field(

nickName,

 readerreader1[nickName].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

 

// uid

doc.Add(new Lucene.Net.Documents.Field(

uid,

 readerreader1[uid].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

writer.AddDocument(doc);

 

// acttime

doc.Add(new Lucene.Net.Documents.Field(

acttime,

 readerreader1[acttime].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

writer.AddDocument(doc);

//

 

Any ideas?

 

Thanks,

Wen Gao

 

 




[jira] Issue Comment Edited: (LUCENENET-379) Clean up Lucene.Net website

2011-02-16 Thread michael herndon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283
 ] 

michael herndon edited comment on LUCENENET-379 at 2/16/11 1:32 PM:


I think anything would be better than the current one (which would look cool if 
it was cleaned up and put on the side of a chevelle. But I don't know how would 
help brand lucene.net).

I'd say keep doing a few more variations. Open it up for the public to make 
some submissions as well. (giving credit to whoever's design is chosen maybe 
even give them some social media love).

The final one needs to work well with both rgb and cymk color formats and in a 
scalable graphics format so that it can be resized cleanly. 

Also it should have a visual aspect of it that can be turned into a decent 16 x 
16 favicon.  (like the 3 yellow hexagons that is in the jpg).

Though keep in mind basic color theory. Yellow is irritating on the eyes. Its 
definitely grabs attention, but its harder on the eyes for an extended period 
of time. Green is the most relaxing. 

But above all else: keep moving forward towards something new.  

:: edited due to posting this while on an empty stomach, never wise ::


  was (Author: michaelherndon):
I think anything would be better than the current one (which would look 
cool if was cleaned up and put on the side of a chevelle, but I don't know how 
would help brand lucene.net).

I'd say keep doing a few more variations. open it up for the public to make 
some submissions as well. (giving credit to whoever's design is chosen maybe 
even give them some social media love).

The final one needs to work well with both rgb and cymk color formats and in a 
scalable graphics format so that it can be resized cleanly. 

Also it should have a visual aspect of it that can be turned into a decent 16 x 
16 favicon.  (like the 3 yellow hexagons that is in the jpg).

Though keep in mind basic color theory. Yellow is irritating on the eyes. Its 
definitely grabs attention, but its harder on the eyes for an extended period 
of time. Green is the most relaxing. 

But above all else keep moving forward towards something new.  


  
 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: how can I get the similarity in fuzzy query

2011-02-16 Thread Christopher Currens
As far as i know, you'll need to calculate that manually.  FuzzyQuery
searches don't return any results like that.

On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote:

 Hi,
 I think my situation is just to compare the similarity of strings: I want
 to
 calculate the similarity between the typed results and the returned results
 using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
 what i want to do is get the similariy instead of score for every
 result
 that returns.

 Thanks for your time.

 Wen

 2011/2/16 Christopher Currens currens.ch...@gmail.com

  I was going to post the link that Digy posted, which suggests not to
  determine a match that way.  If my understanding is correct, the scores
  returned for a query are relative to which documents were retrieved by
 the
  search, in that if a document is deleted from the index, the scores will
  change even though the query did not, because the number of returned
  documents are different.
 
  If the only thing you wanted to do was to calculate how a resulting
 string
  was to a search string, I suggest the Levenshtein Distance algorithm
  http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
  like
  that's quite what you want to accomplish based on your question.
 
  Christopher
 
  On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com
 wrote:
 
   Hi,
   I am using FuzzyQuery to get fuzzy mathed results. I want to get the
   similarity in percent for every matched record.
   for example, if i search for databasd, and it will return results
 such
  as
   database, database1, and database11. I want to get the similarity
  in
   percent for evey record, such as 87.5%, 75%, and 62.5%.
  
   How can I do this?
  
   Any ideas?
  
   Wen Gao
  
 



RE: how can I get the similarity in fuzzy query

2011-02-16 Thread Digy
Whether *fuzzy* or not, all queries are simple term queries at the end and
Lucene does not have an info like *similarity*, just scores.

DIGY

-Original Message-
From: Wen Gao [mailto:samuel.gao...@gmail.com] 
Sent: Wednesday, February 16, 2011 9:47 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: how can I get the similarity in fuzzy query

Hi,
I think my situation is just to compare the similarity of strings: I want to
calculate the similarity between the typed results and the returned results
using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
what i want to do is get the similariy instead of score for every result
that returns.

Thanks for your time.

Wen

2011/2/16 Christopher Currens currens.ch...@gmail.com

 I was going to post the link that Digy posted, which suggests not to
 determine a match that way.  If my understanding is correct, the scores
 returned for a query are relative to which documents were retrieved by the
 search, in that if a document is deleted from the index, the scores will
 change even though the query did not, because the number of returned
 documents are different.

 If the only thing you wanted to do was to calculate how a resulting string
 was to a search string, I suggest the Levenshtein Distance algorithm
 http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
 like
 that's quite what you want to accomplish based on your question.

 Christopher

 On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote:

  Hi,
  I am using FuzzyQuery to get fuzzy mathed results. I want to get the
  similarity in percent for every matched record.
  for example, if i search for databasd, and it will return results such
 as
  database, database1, and database11. I want to get the similarity
 in
  percent for evey record, such as 87.5%, 75%, and 62.5%.
 
  How can I do this?
 
  Any ideas?
 
  Wen Gao
 




RE: how can I get the similarity in fuzzy query

2011-02-16 Thread Digy
Download the source from
https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
using a svn client(like TortoiseSVN), and open the project file with VS20XX.

DIGY

-Original Message-
From: Wen Gao [mailto:samuel.gao...@gmail.com] 
Sent: Wednesday, February 16, 2011 9:58 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: how can I get the similarity in fuzzy query

OK. i get it. how can I recompile a Lucene_src on Windows?

Thanks.
Wen
2011/2/16 Christopher Currens currens.ch...@gmail.com

 As far as i know, you'll need to calculate that manually.  FuzzyQuery
 searches don't return any results like that.

 On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote:

  Hi,
  I think my situation is just to compare the similarity of strings: I
want
  to
  calculate the similarity between the typed results and the returned
 results
  using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
 0.5f,
  what i want to do is get the similariy instead of score for every
  result
  that returns.
 
  Thanks for your time.
 
  Wen
 
  2011/2/16 Christopher Currens currens.ch...@gmail.com
 
   I was going to post the link that Digy posted, which suggests not to
   determine a match that way.  If my understanding is correct, the
scores
   returned for a query are relative to which documents were retrieved by
  the
   search, in that if a document is deleted from the index, the scores
 will
   change even though the query did not, because the number of returned
   documents are different.
  
   If the only thing you wanted to do was to calculate how a resulting
  string
   was to a search string, I suggest the Levenshtein Distance algorithm
   http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
 seem
   like
   that's quite what you want to accomplish based on your question.
  
   Christopher
  
   On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com
  wrote:
  
Hi,
I am using FuzzyQuery to get fuzzy mathed results. I want to get the
similarity in percent for every matched record.
for example, if i search for databasd, and it will return results
  such
   as
database, database1, and database11. I want to get the
 similarity
   in
percent for evey record, such as 87.5%, 75%, and 62.5%.
   
How can I do this?
   
Any ideas?
   
Wen Gao
   
  
 




Re: how can I get the similarity in fuzzy query

2011-02-16 Thread Wen Gao
Thanks you.

Wen
2011/2/16 Digy digyd...@gmail.com

 Download the source from
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
 using a svn client(like TortoiseSVN), and open the project file with
 VS20XX.

 DIGY

 -Original Message-
 From: Wen Gao [mailto:samuel.gao...@gmail.com]
 Sent: Wednesday, February 16, 2011 9:58 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: how can I get the similarity in fuzzy query

  OK. i get it. how can I recompile a Lucene_src on Windows?

 Thanks.
 Wen
 2011/2/16 Christopher Currens currens.ch...@gmail.com

  As far as i know, you'll need to calculate that manually.  FuzzyQuery
  searches don't return any results like that.
 
  On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com
 wrote:
 
   Hi,
   I think my situation is just to compare the similarity of strings: I
 want
   to
   calculate the similarity between the typed results and the returned
  results
   using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
  0.5f,
   what i want to do is get the similariy instead of score for every
   result
   that returns.
  
   Thanks for your time.
  
   Wen
  
   2011/2/16 Christopher Currens currens.ch...@gmail.com
  
I was going to post the link that Digy posted, which suggests not to
determine a match that way.  If my understanding is correct, the
 scores
returned for a query are relative to which documents were retrieved
 by
   the
search, in that if a document is deleted from the index, the scores
  will
change even though the query did not, because the number of returned
documents are different.
   
If the only thing you wanted to do was to calculate how a resulting
   string
was to a search string, I suggest the Levenshtein Distance algorithm
http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
  seem
like
that's quite what you want to accomplish based on your question.
   
Christopher
   
On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com
   wrote:
   
 Hi,
 I am using FuzzyQuery to get fuzzy mathed results. I want to get
 the
 similarity in percent for every matched record.
 for example, if i search for databasd, and it will return results
   such
as
 database, database1, and database11. I want to get the
  similarity
in
 percent for evey record, such as 87.5%, 75%, and 62.5%.

 How can I do this?

 Any ideas?

 Wen Gao

   
  
 




Re: Site

2011-02-16 Thread Ayende Rahien
Off topic, can we get a [Lucene.NET] prefix for messages to the list?

On Wed, Feb 16, 2011 at 11:05 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 Where does that site compile to? The incubator lucene.net site appears to
 be the older one





Re: Site

2011-02-16 Thread Troy Howard
So, currently we are only setup for working in the staging
environment. Once we are ready to publish, we'll need to enter a new
JIRA ticket to the infrastructure project and ask for the site to be
set up for publishing. Once that's done, we will be able to
self-publish whenever we'd like either through the web ui for the CMS
or by running the publish script on the server. Each time we publish,
the changes will build and go public immediately.

The current staging site is here:

http://lucene.net.staging.apache.org/lucene.net/

The CMS Web UI for our site is:

https://cms.apache.org/lucene.net/

You can use the web based editors to do most everything and that's the
preferred method for making site modifications. This provides a
controlled semi WYSIWYG environment for editing and will perform SVN
commits for you when you save. It's a pretty easy system to work with.

At first there were some issues with building the site and web ui, but
Joe S in infrastructure got those taken care of today. I've cleaned up
the other issues with the markdown and we've got a functioning version
available at the staging site. Next steps are to edit content as a
group and get it to where we are comfortable publishing it. Once we do
that, we'll get setup for public publishing.

I found the #asfinfra IRC channel very helpful, as it allowed me to
work with Joe in real time to get the issues resolve and get my
questions answered. I suggest looking there for help on the site, as
the documentation is a bit sparse and a number of aspects of the CMS
design are shrouded in mystery at first because of that. Hopefully
they'll get the documentation updated soon, til then IRC and mailing
lists... :)

Thanks,
Troy


On Wed, Feb 16, 2011 at 1:05 PM, Prescott Nasser geobmx...@hotmail.com wrote:
 Where does that site compile to? The incubator lucene.net site appears to be 
 the older one





[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website

2011-02-16 Thread Troy Howard (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995595#comment-12995595
 ] 

Troy Howard commented on LUCENENET-379:
---

The staging site and CMS Web UI are working now and ready for us to get in 
there and edit content/layout/etc.. 

I set this up with a really basic template copied from the Lucy project, which 
is copied from the default Apache site. 

Browse here to see the staging site: 

http://lucene.net.staging.apache.org/lucene.net/

And here to edit content using CMS Web UI:

https://cms.apache.org/lucene.net/



 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: subclassing Python classes in Java

2011-02-16 Thread Andi Vajda


On Feb 16, 2011, at 9:39, Bill Janssen jans...@parc.com wrote:


How do I subclass a Python class in a JCC-wrapped Java module?


 - define a Java class with native methods
 - using the usual extension tricks have a Python class implement  
these native methods
 - define a subclass of that Java class so as to inherit these native  
implementations


Andi..



In UpLib, I've got a class, uplib.ripper.Ripper, and I'd like to be  
able

to create a Java subclass for that in my module.  I presume I need a
Java interface for that Python class, but how do I hook the two
together so that the Java subclass can inherit from the Python class?

Bill


[jira] Commented: (SOLR-1395) Integrate Katta

2011-02-16 Thread tom liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995212#comment-12995212
 ] 

tom liu commented on SOLR-1395:
---

On Katta slave node, my folder hierarchy is:
|/var/data|root|
|/var/data/hadoop|store hadoop data|
|/var/data/hdfszips|store zip tmp data, which get from hdfs,then move to 
katta's shardes|
|/var/data/solr|root store solr core configures|
|/var/data/solr/seoproxy|store seoproxy's solr config,which is used by 
sub-proxy|
|/var/data/katta/shards/nodename_2/seo0#seo0|store seo0 shard,which is 
deployed from master node|
|/var/data/zkdata|store zkserver data,which is zk logs and snapshotes|

On Katta master node, my folder hierarchy is:
|/var/data|root|
|/var/data/hadoop|store hadoop data|
|/var/data/hdfsfile|store solr tmp data, which get from solr dataimporter,then 
zip  put to hdfs|
|/var/data/solr|root store solr core configures|
|/var/data/solr/seo|store seo's solr config,which is used by tomcat's webapp|
|/var/data/zkdata|store zkserver data,which is zk logs and snapshotes|

so, my config is from five folderes:
|Master|/var/data/solr/seo|tomcat webapp's solrcore config|
|Slave|/var/data/solr/seoproxy|sub-proxy's solrcore config|
|Master|/var/data/hdfsfile|query-core's config,which is config template|
|HDFS|http://hdfsname:9000/seo/seo0.zip|query-core seo0's zip file,which is 
hold conf|
|Slave|/var/data/katta/shards/nodename_2/seo0#seo0/conf|query-core seo0's 
config,which is unzipped from seo0.zip of HDFS|

and, /var/data/hdfsfile structure is:
{noformat}
seo@seo-solr1:/var/data/hdfsfile$ ll
total 28
drwxr-xr-x 6 seo seo 4096 Oct 21 15:21 ./
drwxr-xr-x 4 seo seo 4096 Feb 16 15:49 ../
drwxr-xr-x 2 seo seo 4096 Oct  8 09:17 bin/
drwxr-xr-x 4 seo seo 4096 Jan 21 18:22 conf/
drwxr-xr-x 3 seo seo 4096 Oct 21 15:21 data/
drwxr-xr-x 2 seo seo 4096 Sep 29 14:01 lib/
-rw-r--r-- 1 seo seo 1320 Oct  8 09:20 solr.xml
{noformat}


 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
 log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
 solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
 solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Please mark distributed date faceting for 3.1

2011-02-16 Thread Robert Muir
On Wed, Feb 16, 2011 at 12:06 AM, Smiley, David W. dsmi...@mitre.org wrote:
 I may have added a test just now, but I and others have been using this 
 [simple] code for some time now.  It has baked, it doesn't need more baking 
 IMO.

I am sure people will say I am just being silly, but hudson does a
better job testing these things than people playing with the code. For
example, hudson randomizes external variables (locale X timezone)...
on the latest 1.6u23 there are 152 locales, and 609 timezones (only
424 unique according to raw offset + rules). With hudson selecting 1
of these ~ 65K possibilities 96 times a day, you can start to
calculate how long is a good baking for date-related functionality.

Someone can argue that because Solr insists on treating dates
internally, that this does not matter, but I have found and fixed
timezone and localization related bugs in Lucene and Solr before, so
that argument fails... not knowing the surrounding code, nothing makes
me feel better than a couple weeks of hudson grinding on the code.

Even then, sometimes a few weeks isnt enough.. for example if I
remember right, SOLR-1821 was daylight-savings related (note: the
issue was reported the very day daylight savings started in the United
States, but in other timezones it had not yet, and would fail for some
developers but not others).

 If this patch wasn't the biggest reason to not use distributed search (a key 
 feature) then I wouldn't be here arguing my point.  But I've apparently lost 
 this argument already so I give up;... assign if for 3.2 if that's the best 
 you can do Rob. It's better than being unassigned which is what it is now.


I don't think that would be the best, as its not my area of expertise.
If I see good patches being ignored because other devs are
time-constrained sometimes I will take the time to try to bring myself
up to speed to get them committed though, but I haven't yet given up
on this patch :)

Just so you know, Its nothing about your patch at all, I am just
against any new features of any sort being added to 3.1 at this point.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2011-02-16 Thread Li Li
   our recent experiments show that PFOR is not a good solution for and query
we tested it with our dataset and users' queries. for most case, PFOR is slower
than vint. we analyzed the reason may be that it's very likely there
is a low-frequent
term in most queries. So the scoring time is the majority while decoding is not.
   e.g in our index, term beijing's df is 2557916 and park is
2313201, both them
are hight frequent terms. but the count of documents containing both
is only 1552
   for vint, it only need decode 1552 documents, while PFOR, it may decode many
blocks.
   for most search engines, and query is used. So PFOR is only good for or query
and and query whose terms are all high frequent.
   So we have to give up this in our application.
   partial decoder for PFOR? for all high frequent terms, using normal
PFOR decoder
;for quries with low frequent  terms, using partial decoder?
   partial decoder of PFOR many need many if/else and will be slower.
   Any one has any solution for this?


2010/12/27 Li Li fancye...@gmail.com:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case
 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

 2010/12/24 Michael McCandless luc...@mikemccandless.com:
 Well, an early patch somewhere was able to run PFor on trunk, but the
 performance wasn't great because the trunk bulk-read API is a
 bottleneck (this is why the bulk postings branch was created).

 Mike

 On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote:
 I used the bulkpostings
 branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
 does trunk have PForDelta decoder/encoder ?

 2010/12/23 Michael McCandless luc...@mikemccandless.com:
 Those are nice speedups!

 Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this 
 test?

 Mike

 On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: inverted index pruning

2011-02-16 Thread Li Li
great things.
But I think the patch is different from the method in that paper.
my colleague had tested this patch but don't get good results
(I don't know the detail well, and he just tell me his experience)

2011/2/15 Andrzej Bialecki a...@getopt.org:
 On 2/15/11 11:57 AM, Li Li wrote:

 hi all,
     I recently read a paper Pruning Policies for Two-Tiered Inverted
 Index with Correctness Guarantee. It's idea is interesting and I
 have some questions and like to share with you.

 Please take a look at LUCENE-1812, LUCENE-2632 and my presentation from
 Apache EuroCon 2010 in Prague, Munching and Crunching.


 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2011-02-16 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-1812:
---

Assignee: Doron Cohen

 Static index pruning by in-document term frequency (Carmel pruning)
 ---

 Key: LUCENE-1812
 URL: https://issues.apache.org/jira/browse/LUCENE-1812
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 2.9, 3.1
Reporter: Andrzej Bialecki 
Assignee: Doron Cohen
 Attachments: pruning.patch, pruning.patch, pruning.patch, 
 pruning.patch


 This module provides tools to produce a subset of input indexes by removing 
 postings data for those terms where their in-document frequency is below a 
 specified threshold. The net effect of this processing is a much smaller 
 index that for common types of queries returns nearly identical top-N results 
 as compared with the original index, but with increased performance. 
 Optionally, stored values and term vectors can also be removed. This 
 functionality is largely independent, so it can be used without term pruning 
 (when term freq. threshold is set to 1).
 As the threshold value increases, the total size of the index decreases, 
 search performance increases, and recall decreases (i.e. search quality 
 deteriorates). NOTE: especially phrase recall deteriorates significantly at 
 higher threshold values. 
 Primary purpose of this class is to produce small first-tier indexes that fit 
 completely in RAM, and store these indexes using 
 IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class 
 will not be sufficient to use the resulting index view for on-the-fly pruning 
 and searching. 
 NOTE: If the input index is optimized (i.e. doesn't contain deletions) then 
 the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve 
 internal document id-s so that they are in sync with the original index. This 
 means that all other auxiliary information not necessary for first-tier 
 processing, such as some stored fields, can also be removed, to be quickly 
 retrieved on-demand from the original index using the same internal document 
 id. 
 Threshold values can be specified globally (for terms in all fields) using 
 defaultThreshold parameter, and can be overriden using per-field or per-term 
 values supplied in a thresholds map. Keys in this map are either field names, 
 or terms in field:text format. The precedence of these values is the 
 following: first a per-term threshold is used if present, then per-field 
 threshold if present, and finally the default threshold.
 A command-line tool (PruningTool) is provided for convenience. At this moment 
 it doesn't support all functionality available through API.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2011-02-16 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1812:


Affects Version/s: (was: 3.1)
   (was: 2.9)
Fix Version/s: 4.0
   3.2

 Static index pruning by in-document term frequency (Carmel pruning)
 ---

 Key: LUCENE-1812
 URL: https://issues.apache.org/jira/browse/LUCENE-1812
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Reporter: Andrzej Bialecki 
Assignee: Doron Cohen
 Fix For: 3.2, 4.0

 Attachments: pruning.patch, pruning.patch, pruning.patch, 
 pruning.patch


 This module provides tools to produce a subset of input indexes by removing 
 postings data for those terms where their in-document frequency is below a 
 specified threshold. The net effect of this processing is a much smaller 
 index that for common types of queries returns nearly identical top-N results 
 as compared with the original index, but with increased performance. 
 Optionally, stored values and term vectors can also be removed. This 
 functionality is largely independent, so it can be used without term pruning 
 (when term freq. threshold is set to 1).
 As the threshold value increases, the total size of the index decreases, 
 search performance increases, and recall decreases (i.e. search quality 
 deteriorates). NOTE: especially phrase recall deteriorates significantly at 
 higher threshold values. 
 Primary purpose of this class is to produce small first-tier indexes that fit 
 completely in RAM, and store these indexes using 
 IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class 
 will not be sufficient to use the resulting index view for on-the-fly pruning 
 and searching. 
 NOTE: If the input index is optimized (i.e. doesn't contain deletions) then 
 the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve 
 internal document id-s so that they are in sync with the original index. This 
 means that all other auxiliary information not necessary for first-tier 
 processing, such as some stored fields, can also be removed, to be quickly 
 retrieved on-demand from the original index using the same internal document 
 id. 
 Threshold values can be specified globally (for terms in all fields) using 
 defaultThreshold parameter, and can be overriden using per-field or per-term 
 values supplied in a thresholds map. Keys in this map are either field names, 
 or terms in field:text format. The precedence of these values is the 
 following: first a per-term threshold is used if present, then per-field 
 threshold if present, and finally the default threshold.
 A command-line tool (PruningTool) is provided for convenience. At this moment 
 it doesn't support all functionality available through API.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2105:
--

Attachment: SOLR-2105.patch

Updated patch attached.

* Use of update.processor is not deprecated but still works, logging a warning
* Added test case which tests that both params work

Patch is for trunk.

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch, SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website

2011-02-16 Thread michael herndon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283
 ] 

michael herndon commented on LUCENENET-379:
---

I think anything would be better than the current one (which would look cool if 
was cleaned up and put on the side of a chevelle, but I don't know how would 
help brand lucene.net).

I'd say keep doing a few more variations. open it up for the public to make 
some submissions as well. (giving credit to whoever's design is chosen maybe 
even give them some social media love).

The final one needs to work well with both rgb and cymk color formats and in a 
scalable graphics format so that it can be resized cleanly. 

Also it should have a visual aspect of it that can be turned into a decent 16 x 
16 favicon.  (like the 3 yellow hexagons that is in the jpg).

Though keep in mind basic color theory. Yellow is irritating on the eyes. Its 
definitely grabs attention, but its arder on the eyes for an extended period of 
time. Green is the most relaxing. 

But above all else keep moving forward towards something new.  



 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Issue Comment Edited: (LUCENENET-379) Clean up Lucene.Net website

2011-02-16 Thread michael herndon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283
 ] 

michael herndon edited comment on LUCENENET-379 at 2/16/11 1:21 PM:


I think anything would be better than the current one (which would look cool if 
was cleaned up and put on the side of a chevelle, but I don't know how would 
help brand lucene.net).

I'd say keep doing a few more variations. open it up for the public to make 
some submissions as well. (giving credit to whoever's design is chosen maybe 
even give them some social media love).

The final one needs to work well with both rgb and cymk color formats and in a 
scalable graphics format so that it can be resized cleanly. 

Also it should have a visual aspect of it that can be turned into a decent 16 x 
16 favicon.  (like the 3 yellow hexagons that is in the jpg).

Though keep in mind basic color theory. Yellow is irritating on the eyes. Its 
definitely grabs attention, but its harder on the eyes for an extended period 
of time. Green is the most relaxing. 

But above all else keep moving forward towards something new.  



  was (Author: michaelherndon):
I think anything would be better than the current one (which would look 
cool if was cleaned up and put on the side of a chevelle, but I don't know how 
would help brand lucene.net).

I'd say keep doing a few more variations. open it up for the public to make 
some submissions as well. (giving credit to whoever's design is chosen maybe 
even give them some social media love).

The final one needs to work well with both rgb and cymk color formats and in a 
scalable graphics format so that it can be resized cleanly. 

Also it should have a visual aspect of it that can be turned into a decent 16 x 
16 favicon.  (like the 3 yellow hexagons that is in the jpg).

Though keep in mind basic color theory. Yellow is irritating on the eyes. Its 
definitely grabs attention, but its arder on the eyes for an extended period of 
time. Green is the most relaxing. 

But above all else keep moving forward towards something new.  


  
 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Issue Comment Edited: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995282#comment-12995282
 ] 

Jan Høydahl edited comment on SOLR-2105 at 2/16/11 1:24 PM:


Updated patch attached.

* Use of update.processor is now deprecated, logging a warning (instead of 
removing as in previous patch)
* Added test case which tests that both params work

Patch is for trunk.

  was (Author: janhoy):
Updated patch attached.

* Use of update.processor is not deprecated but still works, logging a warning
* Added test case which tests that both params work

Patch is for trunk.
  
 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch, SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2903:
---

Attachment: LUCENE-2903.patch

Thanks Hao!  The new patch looks great -- much leaner.

I fixed a few things... new patch attache.  To keep the comparison
fair, I cutover BulkVInt back to Sep (it was Fixed (interleaved)).  I
also impl'd skipBlock in PFor4 (though this method is never called by
Sep).  I cutover PFor4 to var gap terms index.

Finally I added back copyright headers (Simple16.java's had been
stripped but other new sources were missing...).  Also,
we need to eventually remove the @author tags..

One question: it looks like this PFOR impl can only handle up to 28
bit wide ints?  Which means... could it could fail on some cases?
Though I suppose you would never see too many of these immense ints in
one block, and so they'd always be encoded as exceptions and so it's
actually safe...?

Here are the results on Linux, MMapDir, 10M docs, unshuffled:

||Query||QPS BulkVInt||QPS PFor4||Pct diff
|united states|13.66|11.63|{color:red}-14.9%{color}|
|u*d|12.75|11.55|{color:red}-9.4%{color}|
|un*d|24.71|22.46|{color:red}-9.1%{color}|
|uni*|24.68|22.85|{color:red}-7.4%{color}|
|unit*|41.22|39.25|{color:red}-4.8%{color}|
|+nebraska +states|128.41|123.73|{color:red}-3.6%{color}|
|spanFirst(unit, 5)|263.41|258.27|{color:red}-1.9%{color}|
|+united +states|21.37|21.09|{color:red}-1.3%{color}|
|title:.*[Uu]nited.*|5.70|5.66|{color:red}-0.6%{color}|
|timesecnum:[1 TO 6]|15.01|14.96|{color:red}-0.4%{color}|
|unit~0.7|41.78|43.44|{color:green}4.0%{color}|
|united states~3|6.48|6.79|{color:green}4.8%{color}|
|unit~0.5|24.61|25.83|{color:green}4.9%{color}|
|spanNear([unit, state], 10, true)|52.34|55.67|{color:green}6.4%{color}|
|united~0.6|11.36|12.18|{color:green}7.1%{color}|
|united~0.75|15.96|17.58|{color:green}10.2%{color}|
|states|53.41|61.03|{color:green}14.3%{color}|
|united states|16.87|20.62|{color:green}22.2%{color}|

Very nice!


 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch, LUCENE-2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2366) Facet Range Gaps

2011-02-16 Thread Grant Ingersoll (JIRA)
Facet Range Gaps


 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0


There really is no reason why the range gap for date and numeric faceting needs 
to be evenly spaced.  For instance, if and when SOLR-1581 is completed and one 
were doing spatial distance calculations, one could facet by function into 3 
different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) 
and everything else (150KM+), for instance.  We should be able to quantize the 
results into arbitrarily sized buckets.  I'd propose the syntax to be a comma 
separated list of sizes for each bucket.  If only one value is specified, then 
it behaves as it currently does.  Otherwise, it creates the different size 
buckets.  If the number of buckets doesn't evenly divide up the space, then the 
size of the last bucket specified is used to fill out the remaining space (not 
sure on this)
For instance,
facet.range.start=0
facet.range.end=400
facet.range.gap=5,25,50,100

would yield buckets of:
0-5,5-30,30-80,80-180,180-280,280-380,380-400



-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 4967 - Failure

2011-02-16 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/4967/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting

Error Message:
flush happened too quickly during deleting count=1155

Stack Trace:
junit.framework.AssertionFailedError: flush happened too quickly during 
deleting count=1155
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1183)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1115)
at 
org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2579)




Build Log (for compile errors):
[...truncated 3048 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2903:


Attachment: for_pfor.patch

Nice results Hao!

One idea for the low-frequency multitermqueries (foo* etc) could be in the 
attached patch: i only implemented this for the existing FrameOfRef and 
PatchedFrameOfRef but perhaps you could steal/test the idea with your 
implementation.

In these cases i switched them over to a single byte header instead of an int. 

This means less overhead per-block, a slightly smaller (maybe 1-2%?) index. It 
might be more useful if we switch your codec over from Sep layout to 
interleaved (Fixed) layout, to make a more efficient skipBlock()... but this 
interleaved layout is still a work in progress.


 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2366) Facet Range Gaps

2011-02-16 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2366:
--

Attachment: SOLR-2366.patch

Adds variable width gap capabilities and some tests.  Still needs some more 
tests for edge conditions, etc. but it is something that others can look at and 
comment on.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-236) Field collapsing

2011-02-16 Thread Doug Steigerwald (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995362#comment-12995362
 ] 

Doug Steigerwald commented on SOLR-236:
---

Has anyone successfully applied field collapsing to the branch_3x branch?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: DocSetScoreCollector.java, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
 SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
 collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-2105:
-

Assignee: Mark Miller

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Assignee: Mark Miller
Priority: Minor
 Attachments: SOLR-2105.patch, SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1191) NullPointerException in delta import

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-1191:


Attachment: SOLR-1191.patch

Updated patch with unit test.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch, SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  

[jira] Created: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)
DataImportHandler unit tests are very noisy
---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Priority: Trivial


Running DataImportHandler unit tests emits a lot of console noise, mainly 
stacktraces because dataimport.properties can't be written. This makes it hard 
to scan the output for useful information.

I'm attaching a patch to get rid of most of the noise by creating the conf 
directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-2367:


Attachment: SOLR-2367.patch

Patch to address this issue. Creates conf directories under work directory 
before test runs, and suppresses a warning.

The console noise that remains is some XML parsing failure, which may or may 
not be meaningful (I don't know) — at least now it is visible. :)

 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Priority: Trivial
 Attachments: SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1553) extended dismax query parser

2011-02-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1553.


   Resolution: Fixed
Fix Version/s: (was: 4.0)
   (was: 1.5)

Resolving.  Improvements can be tracked in a new issue.

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2368) Improve extended dismax (edismax) parser

2011-02-16 Thread Yonik Seeley (JIRA)
Improve extended dismax (edismax) parser


 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley


Improve edismax and replace dismax once it has all of the needed features.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995387#comment-12995387
 ] 

Gunnlaugur Thor Briem edited comment on SOLR-2367 at 2/16/11 5:09 PM:
--

Patch to address this issue. Creates conf directories under work directory 
before test runs, and suppresses a warning.

The console noise that remains is some XML parsing failure, which may or may 
not be meaningful (I don't know) — at least now it is visible. :)

This patch is against branch_3x as of just now.

  was (Author: gthb):
Patch to address this issue. Creates conf directories under work directory 
before test runs, and suppresses a warning.

The console noise that remains is some XML parsing failure, which may or may 
not be meaningful (I don't know) — at least now it is visible. :)
  
 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Priority: Trivial
 Attachments: SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995381#comment-12995381
 ] 

Gunnlaugur Thor Briem edited comment on SOLR-1191 at 2/16/11 5:09 PM:
--

Updated patch with unit test, against current branch_3x.

  was (Author: gthb):
Updated patch with unit test.
  
 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch, SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 

Re: duplicate records in index

2011-02-16 Thread Wen Gao
I saw that. so careless..
Thanks.

Wen Gao

2011/2/16 Digy digyd...@gmail.com

 You are adding the same doc twice.
 (See how you add acttime )

 DIGY

 -Original Message-
 From: Wen Gao [mailto:samuel.gao...@gmail.com]
 Sent: Wednesday, February 16, 2011 11:35 AM
 To: lucene-net-...@lucene.apache.org
 Subject: duplicate records in index

 Hi,

 I am creating an index from my database, however, the record in .cfs files
 contains duplicate records,  e.g.

 book1, 1, susan, 1

 book1, 1,susan,1, 03/01/2010

 book2, 2,tom,

 book2,2,tom, 2,03/02/2010

 ..



 I got the data from several tables, and am sure that the sql only generate
 one record. Also, when I debug the code, the record is only added once.

 So I am confused whether data replicate in idex.



 I define my index as following format:

 

 doc.Add(new Lucene.Net.Documents.Field(

lmname,

readerreader1[lmname].ToString(),

//new
 System.IO.StringReader(readerreader[cname].ToString()),

Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.TOKENIZED)





);

//lmid

doc.Add(new Lucene.Net.Documents.Field(

lmid,

readerreader1[lmid].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));



// nick name of user

doc.Add(new Lucene.Net.Documents.Field(

nickName,

 readerreader1[nickName].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));



// uid

doc.Add(new Lucene.Net.Documents.Field(

uid,

 readerreader1[uid].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

writer.AddDocument(doc);



// acttime

doc.Add(new Lucene.Net.Documents.Field(

acttime,

 readerreader1[acttime].ToString(),

 Lucene.Net.Documents.Field.Store.YES,

 Lucene.Net.Documents.Field.Index.UN_TOKENIZED));

writer.AddDocument(doc);

 //



 Any ideas?



 Thanks,

 Wen Gao









[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2367:
--

Attachment: SOLR-2367.patch

Thanks for the patch. I modified it, to just specify the absolute path to these 
directories.

this way we don't have to make any useless directories underneath the CWD.

Separately, as far as the exceptions, this is in the test TestErrorHandling, 
its 'expected exceptions'. I tried to modify this test to use the 'expected 
exception' logic in SolrTestCaseJ4, etc, but I could not make it work. 

I think this is because DIH throws DataImportHandlerExceptions (extends 
RuntimeException) instead of ones that extend SolrException?


 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367.patch, SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website

2011-02-16 Thread Alex Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995405#comment-12995405
 ] 

Alex Thompson commented on LUCENENET-379:
-

The concept of the current logo isn't that bad, its just executed poorly (looks 
like someone did it in Paint). I don't mind if it changes but maybe keep a 
green color scheme to imply our loose connection with java lucene.

 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




subclassing Python classes in Java

2011-02-16 Thread Bill Janssen
How do I subclass a Python class in a JCC-wrapped Java module?

In UpLib, I've got a class, uplib.ripper.Ripper, and I'd like to be able
to create a Java subclass for that in my module.  I presume I need a
Java interface for that Python class, but how do I hook the two
together so that the Java subclass can inherit from the Python class?

Bill


[jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-16 Thread hao yan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995436#comment-12995436
 ] 

hao yan commented on LUCENE-2903:
-

Thank both of you! Thanks for testing my codec so quickly, Michael! 

RE: One question: it looks like this PFOR impl can only handle up to 28
bit wide ints? Which means... could it could fail on some cases?
Though I suppose you would never see too many of these immense ints in
one block, and so they'd always be encoded as exceptions and so it's
actually safe...?

Hao: This won't fail. In my PFOR impl, I will first checkBigNumbers() to see if 
there is any number = 2^28, if there is, i will force encoding the lower 4 
bits using the 128 4-bit slots. Thus, all exceptions left to simple16 are  
2^28, which can definitely be handled. So, there is no failure cases!!! :) . 

BTW, my PFOR impl will save more index size than VInt and other PFOR impls. 
Thus, if the user case is real-time search which requires loading index from 
disk to memory frequently, my PFOR impl may save even more. 


  





 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



how can I get the similarity in fuzzy query

2011-02-16 Thread Wen Gao
Hi,
I am using FuzzyQuery to get fuzzy mathed results. I want to get the
similarity in percent for every matched record.
for example, if i search for databasd, and it will return results such as
database, database1, and database11. I want to get the similarity in
percent for evey record, such as 87.5%, 75%, and 62.5%.

How can I do this?

Any ideas?

Wen Gao


[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995466#comment-12995466
 ] 

Gunnlaugur Thor Briem commented on SOLR-2367:
-

Oh, right, much neater.

 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367.patch, SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: how can I get the similarity in fuzzy query

2011-02-16 Thread Digy
http://wiki.apache.org/lucene-java/ScoresAsPercentages

DIGY

-Original Message-
From: Wen Gao [mailto:samuel.gao...@gmail.com] 
Sent: Wednesday, February 16, 2011 8:55 PM
To: lucene-net-...@lucene.apache.org
Subject: how can I get the similarity in fuzzy query

Hi,
I am using FuzzyQuery to get fuzzy mathed results. I want to get the
similarity in percent for every matched record.
for example, if i search for databasd, and it will return results such as
database, database1, and database11. I want to get the similarity in
percent for evey record, such as 87.5%, 75%, and 62.5%.

How can I do this?

Any ideas?

Wen Gao



[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-2367:


Attachment: SOLR-2367-extend-SolrException.patch

If it helps, here's a patch that makes DataImportHandlerException extend 
SolrException (and deprecates a constructor that seems not to be used 
anywhere). All tests pass, but beyond that this has not been tried out at 
runtime (and maybe the change isn't even appropriate?) ... does this make the 
exception silencing work?

 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, 
 SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2923) remove writer.optimize() from contrib/demo

2011-02-16 Thread Michael McCandless (JIRA)
remove writer.optimize() from contrib/demo
--

 Key: LUCENE-2923
 URL: https://issues.apache.org/jira/browse/LUCENE-2923
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0


I don't think we should include optimize in the demo; many people start from 
the demo and may think you must optimize to do searching, and that's clearly 
not the case.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995479#comment-12995479
 ] 

Robert Muir commented on SOLR-2367:
---

Thanks for the followup patch, I will try and see if i can use the exception 
ignores mechanism now with it... maybe this time it will work.


 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, 
 SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: how can I get the similarity in fuzzy query

2011-02-16 Thread Christopher Currens
I was going to post the link that Digy posted, which suggests not to
determine a match that way.  If my understanding is correct, the scores
returned for a query are relative to which documents were retrieved by the
search, in that if a document is deleted from the index, the scores will
change even though the query did not, because the number of returned
documents are different.

If the only thing you wanted to do was to calculate how a resulting string
was to a search string, I suggest the Levenshtein Distance algorithm
http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like
that's quite what you want to accomplish based on your question.

Christopher

On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote:

 Hi,
 I am using FuzzyQuery to get fuzzy mathed results. I want to get the
 similarity in percent for every matched record.
 for example, if i search for databasd, and it will return results such as
 database, database1, and database11. I want to get the similarity in
 percent for evey record, such as 87.5%, 75%, and 62.5%.

 How can I do this?

 Any ideas?

 Wen Gao



Re: how can I get the similarity in fuzzy query

2011-02-16 Thread Wen Gao
Hi,
I think my situation is just to compare the similarity of strings: I want to
calculate the similarity between the typed results and the returned results
using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
what i want to do is get the similariy instead of score for every result
that returns.

Thanks for your time.

Wen

2011/2/16 Christopher Currens currens.ch...@gmail.com

 I was going to post the link that Digy posted, which suggests not to
 determine a match that way.  If my understanding is correct, the scores
 returned for a query are relative to which documents were retrieved by the
 search, in that if a document is deleted from the index, the scores will
 change even though the query did not, because the number of returned
 documents are different.

 If the only thing you wanted to do was to calculate how a resulting string
 was to a search string, I suggest the Levenshtein Distance algorithm
 http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
 like
 that's quite what you want to accomplish based on your question.

 Christopher

 On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote:

  Hi,
  I am using FuzzyQuery to get fuzzy mathed results. I want to get the
  similarity in percent for every matched record.
  for example, if i search for databasd, and it will return results such
 as
  database, database1, and database11. I want to get the similarity
 in
  percent for evey record, such as 87.5%, 75%, and 62.5%.
 
  How can I do this?
 
  Any ideas?
 
  Wen Gao
 



[jira] Commented: (LUCENE-2923) cleanup contrib/demo

2011-02-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995490#comment-12995490
 ] 

Uwe Schindler commented on LUCENE-2923:
---

Yeah, we remove the optimize. Too many people tell me exactly that they should 
optimize because they see it in almost every demo code. Optimizing is with 
recent Lucene versions not needed anymore. It's hard to explain to people, so 
example code and books should never tell to optimize. In books about lucene 
there should also be an explanation when optimizing is needed or usefully, put 
prevent people from always doing this.

 cleanup contrib/demo
 

 Key: LUCENE-2923
 URL: https://issues.apache.org/jira/browse/LUCENE-2923
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0


 I don't think we should include optimize in the demo; many people start from 
 the demo and may think you must optimize to do searching, and that's clearly 
 not the case.
 I think we should also use a buffered reader in FileDocument?
 And... I'm tempted to remove IndexHTML (and the html parser) entirely.  It's 
 ancient, and we now have Tika to extract text from many doc formats.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2923) cleanup contrib/demo

2011-02-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2923:
---

Attachment: LUCENE-2923.patch

Patch.

 cleanup contrib/demo
 

 Key: LUCENE-2923
 URL: https://issues.apache.org/jira/browse/LUCENE-2923
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2923.patch


 I don't think we should include optimize in the demo; many people start from 
 the demo and may think you must optimize to do searching, and that's clearly 
 not the case.
 I think we should also use a buffered reader in FileDocument?
 And... I'm tempted to remove IndexHTML (and the html parser) entirely.  It's 
 ancient, and we now have Tika to extract text from many doc formats.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2923) cleanup contrib/demo

2011-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995513#comment-12995513
 ] 

Mark Miller commented on LUCENE-2923:
-

bq. I think we should also use a buffered reader in FileDocument?

And close the reader...

 cleanup contrib/demo
 

 Key: LUCENE-2923
 URL: https://issues.apache.org/jira/browse/LUCENE-2923
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2923.patch


 I don't think we should include optimize in the demo; many people start from 
 the demo and may think you must optimize to do searching, and that's clearly 
 not the case.
 I think we should also use a buffered reader in FileDocument?
 And... I'm tempted to remove IndexHTML (and the html parser) entirely.  It's 
 ancient, and we now have Tika to extract text from many doc formats.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2923) cleanup contrib/demo

2011-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995518#comment-12995518
 ] 

Mark Miller commented on LUCENE-2923:
-

bq. I don't think we should include optimize in the demo; 

I wonder if it wouldn't be better to leave it, but commented out - with a short 
explanation.

Optimizing is not necessary, but it clearly has benefits to query perf! If you 
are not updating often, I think it can make perfect sense.

So I'm fine with just dropping, but not sure if commenting it out and putting 
something like:
// for an index that is not updated often, we might optimize now

or variation...

 cleanup contrib/demo
 

 Key: LUCENE-2923
 URL: https://issues.apache.org/jira/browse/LUCENE-2923
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2923.patch


 I don't think we should include optimize in the demo; many people start from 
 the demo and may think you must optimize to do searching, and that's clearly 
 not the case.
 I think we should also use a buffered reader in FileDocument?
 And... I'm tempted to remove IndexHTML (and the html parser) entirely.  It's 
 ancient, and we now have Tika to extract text from many doc formats.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: how can I get the similarity in fuzzy query

2011-02-16 Thread Wyatt Barnett
If you are running in VS 2010, I'd advise saving yourself some trouble
and just grabbing the 2.9.2 package off nuget.

On Wed, Feb 16, 2011 at 3:13 PM, Wen Gao samuel.gao...@gmail.com wrote:
 Thanks you.

 Wen
 2011/2/16 Digy digyd...@gmail.com

 Download the source from
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
 using a svn client(like TortoiseSVN), and open the project file with
 VS20XX.

 DIGY

 -Original Message-
 From: Wen Gao [mailto:samuel.gao...@gmail.com]
 Sent: Wednesday, February 16, 2011 9:58 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: how can I get the similarity in fuzzy query

  OK. i get it. how can I recompile a Lucene_src on Windows?

 Thanks.
 Wen
 2011/2/16 Christopher Currens currens.ch...@gmail.com

  As far as i know, you'll need to calculate that manually.  FuzzyQuery
  searches don't return any results like that.
 
  On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com
 wrote:
 
   Hi,
   I think my situation is just to compare the similarity of strings: I
 want
   to
   calculate the similarity between the typed results and the returned
  results
   using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
  0.5f,
   what i want to do is get the similariy instead of score for every
   result
   that returns.
  
   Thanks for your time.
  
   Wen
  
   2011/2/16 Christopher Currens currens.ch...@gmail.com
  
I was going to post the link that Digy posted, which suggests not to
determine a match that way.  If my understanding is correct, the
 scores
returned for a query are relative to which documents were retrieved
 by
   the
search, in that if a document is deleted from the index, the scores
  will
change even though the query did not, because the number of returned
documents are different.
   
If the only thing you wanted to do was to calculate how a resulting
   string
was to a search string, I suggest the Levenshtein Distance algorithm
http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
  seem
like
that's quite what you want to accomplish based on your question.
   
Christopher
   
On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com
   wrote:
   
 Hi,
 I am using FuzzyQuery to get fuzzy mathed results. I want to get
 the
 similarity in percent for every matched record.
 for example, if i search for databasd, and it will return results
   such
as
 database, database1, and database11. I want to get the
  similarity
in
 percent for evey record, such as 87.5%, 75%, and 62.5%.

 How can I do this?

 Any ideas?

 Wen Gao

   
  
 





[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995537#comment-12995537
 ] 

Robert Muir commented on SOLR-2367:
---

I tried to use your patch and silence the tests in various ways... I was 
unsuccessful.

Its a mystery really to me (because I don't understand the code that well) that 
all
these exceptions are being thrown and nothing is failing... so I'm not sure how 
to silence them.

Lets commit the first patch and fix 80% of the problem... maybe we can figure 
out the other exceptions in the future. I'll keep the issue open.

 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, 
 SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2365) DIH should not be in the Solr war

2011-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995540#comment-12995540
 ] 

David Smiley commented on SOLR-2365:


Uwe; are you willing to put fix-for of 3.1 on this or is that a touchy subject? 
;-P

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2369) Zookeeper depends on log4j, thus also SolrCloud does

2011-02-16 Thread JIRA
Zookeeper depends on log4j, thus also SolrCloud does


 Key: SOLR-2369
 URL: https://issues.apache.org/jira/browse/SOLR-2369
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 3.1
Reporter: Jan Høydahl


Reproduce:
1. Use default Solr example build (with JDK logging)
2. Run example C on http://wiki.apache.org/solr/SolrCloud
3. You get Exception:
   java.lang.NoClassDefFoundError: org/apache/log4j/jmx/HierarchyDynamicMBean
   at 
org.apache.zookeeper.jmx.ManagedUtil.registerLog4jMBeans(ManagedUtil.java:51)
   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:114)
   at org.apache.solr.cloud.SolrZkServer$1.run(SolrZkServer.java:111)

Probable reason:
Zookeeper depends on log4j

Quickfix:
Switch to log4j logging (as you cannot include both log4j bridge and log4j):
* Remove log4j-over-slf4j-1.5.5.jar and slf4j-jdk14-1.5.5.jar
* Add slf4j-logj12.jar and log4j-1.2.16.jar

Document the shortcoming in release notes

Long term fix:
Vote for the resolution of ZOOKEEPER-850 which switches ZK to slf4j logging

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2011-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1299#comment-1299
 ] 

David Smiley commented on SOLR-1553:


I'm confused about why this cool query parser I've been using is 
experimental.  Sure, there are opportunities for improvement, but it's 
already better than the original dismax which this makes obsolete. No?

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2366) Facet Range Gaps

2011-02-16 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2366:
--

Attachment: SOLR-2366.patch

Added more tests, cleaned up the patch, all tests pass.  I think it is ready to 
commit and will do so in a day or two or maybe this weekend.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.

2011-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995567#comment-12995567
 ] 

David Smiley commented on SOLR-756:
---

Jan, you refer to the Extended Dismax QParser -- and the answer is no.  I think 
you intended to comment on SOLR-758.  This patch here, as I said in a comment 
above here 
https://issues.apache.org/jira/browse/SOLR-756?focusedCommentId=12630223page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12630223
 only has to do with a specific improvement to SolrPluginUtils.java, that is an 
enabler for other improvements to DismaxQParser. According to Hoss, I need to 
add tests for this issue.

 Make DisjunctionMaxQueryParser generally useful by supporting all query types.
 --

 Key: SOLR-756
 URL: https://issues.apache.org/jira/browse/SOLR-756
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: Next

 Attachments: SolrPluginUtilsDisMax.patch


 This is an enhancement to the DisjunctionMaxQueryParser to work on all the 
 query variants such as wildcard, prefix, and fuzzy queries, and to support 
 working in AND scenarios that are not processed by the min-should-match 
 DisMax QParser. This was not in Solr already because DisMax was only used for 
 a very limited syntax that didn't use those features. In my opinion, this 
 makes a more suitable base parser for general use because unlike the 
 Lucene/Solr parser, this one supports multiple default fields whereas other 
 ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of 
 a single default field is antiquated and a technical under-the-hood detail of 
 Lucene that I think Solr should shield the user from by on-the-fly using a 
 DisMax when multiple fields are used. 
 (patch to be attached soon)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2358) Distributing Indexing

2011-02-16 Thread Alex Cowell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995570#comment-12995570
 ] 

Alex Cowell commented on SOLR-2358:
---

bq. Since this functionality is core to Solr and should always be present, it 
would be natural to either build it into the DirectUpdateHandler2 or to add 
this processor to the set of default UpdateProcessors that are executed if no 
update.processor parameter is specified.

What advantage would we gain from moving this functionality into 
DirectUpdateHandler2? From what I understand, the UpdateHandler deals directly 
with the index whereas the DistributedUpdateRequestProcessor merely takes 
requests deemed to be distributed by the request handler and distributes them 
to a list of shards based on a distribution policy. 

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
Reporter: William Mayor
Priority: Minor
 Attachments: SOLR-2358.patch


 The first steps towards creating distributed indexing functionality in Solr

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1191) NullPointerException in delta import

2011-02-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1191.


   Resolution: Fixed
Fix Version/s: (was: 1.4)
   3.1
 Assignee: (was: Noble Paul)

Thanks Gunnlaugur, I committed to trunk and 3x.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
 Fix For: 3.1

 Attachments: SOLR-1191.patch, SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  

[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy

2011-02-16 Thread Gunnlaugur Thor Briem (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-2367:


Attachment: SOLR-2367-log-exceptions-through-SolrException.patch

Here goes the remaining 20% — I'm attaching 
SOLR-2367-log-exceptions-through-SolrException.patch which makes 
{{DataImportHandler}} log exceptions through {{SolrException.log()}} instead of 
directly into the logger. This way the exception-ignoring mechanism gets a say 
in matters. Test output is nice and clean now. I addressed only those logger 
calls that were emitting exceptions in unit test runs.

Note: this does *not* require {{DataImportHandlerException}} to extend 
{{SolrException}}, so the earlier SOLR-2367-extend-SolrException.patch is not 
needed. (Might still be worthwhile, I don't know — but not needed for this fix).

 DataImportHandler unit tests are very noisy
 ---

 Key: SOLR-2367
 URL: https://issues.apache.org/jira/browse/SOLR-2367
 Project: Solr
  Issue Type: Improvement
  Components: Build, contrib - DataImportHandler
Reporter: Gunnlaugur Thor Briem
Assignee: Robert Muir
Priority: Trivial
 Attachments: SOLR-2367-extend-SolrException.patch, 
 SOLR-2367-log-exceptions-through-SolrException.patch, SOLR-2367.patch, 
 SOLR-2367.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 Running DataImportHandler unit tests emits a lot of console noise, mainly 
 stacktraces because dataimport.properties can't be written. This makes it 
 hard to scan the output for useful information.
 I'm attaching a patch to get rid of most of the noise by creating the conf 
 directory before test runs so that the properties file write doesn't fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2365) DIH should not be in the Solr war

2011-02-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995583#comment-12995583
 ] 

Uwe Schindler commented on SOLR-2365:
-

+1; who wants to set the touchy fix version?

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2365) DIH should not be in the Solr war

2011-02-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2365:


Fix Version/s: 4.0
   3.1

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2366) Facet Range Gaps

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995585#comment-12995585
 ] 

Hoss Man commented on SOLR-2366:


the use case of facet.range (and facet.date before it) was always about having 
ranges generated for you automatcly using a fixed gap size.  if you want 
variable gap sizes, it's just as easy to specify them using facet.query.

i don't really understand how your proposal adds value over using facet.query 
for the ranges you want to have specific widths, and then using facet.range for 
the rest of the ranges you want generated automaticly with a specific gap.

it just seems like a more confusing way of expressing the same thing


 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser

2011-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995587#comment-12995587
 ] 

Jan Høydahl commented on SOLR-2368:
---

I agree with David's comments on SOLR-1553 that edismax is already good enough 
to replace dismax already, as it is clearly better, more useful and also 
backward compatible. It may still need some tuning, but not replacing dismax 
now in 3.1 could be an example of perfect being the enemy of good :)

In Cominvent, we've been using edismax as the main query parser on all customer 
projects for several months now, and it is clearly much better than the old 
dismax, which is not robust enough nor does it allow the syntaxes which people 
have come to expect.

We have not seen any bugs or instabilities on either of these sites where it is 
live: www.dn.no, www.libris.no, 
http://www.rechargenews.com/search?q=oil+AND+(usa+OR+eu) and many more.

May I suggest the following for 3.1:
* defType=dismax is changed to point to Extended DisMax
* defType=basicdismax is pointed to the old Basic DisMax (to give people a way 
to revert if needed)
* defType=edismax is dropped (or added as a temporary alias to dismax)
* The wiki page http://wiki.apache.org/solr/DisMaxQParserPlugin is edited to 
reflect the changes, and specific parameters or features which are likely to be 
changed in future are marked as experimental, may change to warn people.

 Improve extended dismax (edismax) parser
 

 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Improve edismax and replace dismax once it has all of the needed features.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995588#comment-12995588
 ] 

Hoss Man commented on SOLR-1553:


bq. I'm confused about why this cool query parser I've been using is 
experimental

because some of it's current default behavior is less then ideal, 
particularly for people migrating from dismax (ie: see comments about making 
field queries configurable) and in a few cases even broken compared to how it 
worked when the patch was initially commited (see recent comments about foo:bar 
when foo is *not* a field)

in general, marking it experimental is a way to allow us to leave it in the 3.1 
release but still have the flexibility to modify the default behavior moving 
forward.

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2365) DIH should not be in the Solr war

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995591#comment-12995591
 ] 

Hoss Man commented on SOLR-2365:


+1

we need to make sure to call this out at the top of CHANGES.txt so people 
upgrading from 1.x know they *must* modify their solrconfig.xml (to add the 
{{lib/}} directive) if they use DIH ... but yeah, if it doesn't need to be in 
hte war for that JSP to work, then let's keep it as an isolated contrib jar.

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1581) Facet by Function

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995594#comment-12995594
 ] 

Hoss Man commented on SOLR-1581:


could probably reuse the sort parsing code for this ... it does a pretty good 
job of doing a quick test for field names, then looking for a matching 
function, then falling back to an assumption of esoteric field names

 Facet by Function
 -

 Key: SOLR-1581
 URL: https://issues.apache.org/jira/browse/SOLR-1581
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: Next


 It would be really great if we could execute a function and quantize it into 
 buckets that could then be returned as facets.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995596#comment-12995596
 ] 

Hoss Man commented on SOLR-2368:


{quote}
May I suggest the following for 3.1:
* defType=dismax is changed to point to Extended DisMax
{quote}

-1

beyond the key value of don't break on malformed input that using edismax 
would bring to existing dismax users, edismax's default behavior changes to 
many things for me to want to recommend it to existing dismax users (or change 
the default out from under them)

the code will be there in 3.1, and savy users can use it, and we can fix the 
bugs and defaults as we move forward.

 Improve extended dismax (edismax) parser
 

 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Improve edismax and replace dismax once it has all of the needed features.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser

2011-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995606#comment-12995606
 ] 

Jan Høydahl commented on SOLR-2368:
---

As much as I believe the known issues will affect only a tiny percentage of 
existing (or new) dismax users, I have no problem with a more phased approach. 
Trying to see what's best for the user community.

On a humorous note, if I was the non-savy user upgrading Solr from 1.4.1 to 
3.1, I'd for sure read those release notes carefylly and test it all, given the 
huge version leap :)

It would really help a quicker resolution of this long-running issue, if the 
current edismax features and params are documented on the Wiki for others to 
test, and that all known bugs and planned improvements are detailed here or 
linked to this issue so me and others may know how to contribute.

 Improve extended dismax (edismax) parser
 

 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Improve edismax and replace dismax once it has all of the needed features.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995612#comment-12995612
 ] 

Hoss Man commented on SOLR-2348:


Committed revision 1071459. - trunk

working on 3x backporting now

 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2348.patch, SOLR-2348.patch


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr-dev mailing list on Nabble

2011-02-16 Thread Chris Hostetter

: As I mentioned on the solr-dev mailing list 
: 
http://lucene.472066.n3.nabble.com/wind-down-for-3-1-tp2414923p2483929.html, 
: David Smiley's responses to emails on dev@l.a.o have been going to 
: solr-dev@l.a.o.  This is a problem, and it's not restricted to David's 
: emails.

what problem does it cause? 

: I put up a support request on Nabble: 
: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-td6023495.html 
: and the only response so far seems to indicate that mailing lists are 
: managed by admins associated with the project with which each mailing 
: list is associated.

this is relatively new -- it's a change they made several years ago, but 
at the time the solr-dev and java-dev archives were setup, anyone could 
add/configure a list archive forum -- even the description was community 
editable (i know i remember writing the description on that page, but i 
just checked and i don't have a nabble account)

I've even recieved emails from nabble telling me that forums i'm the admin 
of (ie: i asked them to start archive a mailing list) are scheduled for 
deletion do to inactivity, but when i try to login or recover the password 
for the accout they sent me email at, their system says i have no account.

According to the People pages for solr-dev and java-dev, some guy named 
Hugo is the only adminstrator of those forums...

http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=app_peoplenode=506503filter=Administrators
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=app_peoplenode=564358filter=Administrators


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2370) Let some UpdateProcessors be default without explicitly configuring them

2011-02-16 Thread JIRA
Let some UpdateProcessors be default without explicitly configuring them


 Key: SOLR-2370
 URL: https://issues.apache.org/jira/browse/SOLR-2370
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Jan Høydahl


Problem:
Today the user needs to make sure that crucial UpdateProcessors like the Log- 
and Run UpdateProcessors are present when creating a new 
UpdateRequestProcessorChain. This is error prone, and when introducing a new 
core UpdateProcessor, like in SOLR-2358, all existing users need to insert the 
changes into all their pipelines.

A customer made pipeline should not need to care about distributed indexing, 
logging or anything else, and should be as slim as possible.

Proposal:
The proposal is to lend from the first-components and last-components 
pattern used in RequestHandler configs. In that way, we could let all core 
processors be included either first or last by default in all UpdateChains.

To do this, we need a place to configure the defaults, e.g. by a default=true 
param:
{code:xml}
updateRequestProcessorChain name=default default=true
  first-processors
processor class=solr.DistributedUpdateRequestProcessor/
  /first-processors
  last-processors
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /last-processors
/updateRequestProcessorChain
{code}

Next, the customer made chain will be only the center part:
{code:xml}
updateRequestProcessorChain name=mychain
  processor class=my.nice.DoSomethingProcessor/
  processor class=my.nice.DoAnotherThingProcessor/
/updateRequestProcessorChain
{code}

To override the core processors config for a particular chain, you would start 
a clean chain by parameter reset=true
{code:xml}
updateRequestProcessorChain name=mychain reset=true
  processor class=my.nice.DoSomethingProcessor/
  processor class=my.nice.DoAnotherThingProcessor/
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
{code}

If you only need to make sure that one of your custom processors run at the 
very beginning or the very end, you could use:

{code:xml}
updateRequestProcessorChain name=mychain
  processor class=my.nice.DoSomethingProcessor/
  processor class=my.nice.DoAnotherThingProcessor/
  last-processors
processor class=solr.MySpecialDebugProcessor /
  /last-processors
/updateRequestProcessorChain
{code}

The default should be reset=false, but the example schema could keep the 
default chain commented out to provide backward compatibility for upgraders.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2358) Distributing Indexing

2011-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995628#comment-12995628
 ] 

Jan Høydahl commented on SOLR-2358:
---

I'm not sure if DirectUpdateHandler2 is the right location either. My point is 
that the user should not need to manually make sure that the UpdateProcessor is 
present in all his UpdateChains for distributed indexing to work. See new issue 
SOLR-2370 for a suggestion on how to tackle this.

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
Reporter: William Mayor
Priority: Minor
 Attachments: SOLR-2358.patch


 The first steps towards creating distributed indexing functionality in Solr

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-16 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2348.


Resolution: Fixed

Committed revision 1071480. - 3x


 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2348.patch, SOLR-2348.patch


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2011-02-16 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995642#comment-12995642
 ] 

Ryan McKinley commented on SOLR-1553:
-

the 'experimental' label is a flag to say that the behavior will likely change 
in the future -- since back compatibility is taken so seriously, this allows a 
way to add features before they are 100% cooked.  

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2366) Facet Range Gaps

2011-02-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995652#comment-12995652
 ] 

Grant Ingersoll commented on SOLR-2366:
---

bq. it just seems like a more confusing way of expressing the same thing

I think it's a lot less confusing.  You only have to express start, end and the 
size of the buckets you want.  With facet.query, you have to write out each 
expression for every bucket and do the math on all the boundaries.  I don't 
think it is just as easy to specify using facet.query.  Not too mention that 
facet.query also involves a lot more parsing.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org