Re: need some help =)

2010-11-17 Thread asmcad
i need turkish analyzer. my lucene book says i need to use 
SnowballAnalyzer but i can't access to it as 
Lucene.Net.Analysis.Snowball should i install another library to use it?


On 17.11.2010 21:12, Granroth, Neal V. wrote:

You need to pick a suitable analyzer for use during indexing and for queries.  
The StandardAnalyzer you are using will most likely break the words apart at 
the non-english characters.

You might want to consider using the Luke tool to inspect the index you've 
created and see who the words in your documents were split and indexed.


- Neal

-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 3:06 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: need some help =)


i solved the problem . now i have non-english character problem.
when i search like something çşğuı(i'm not sure you can see this)
characters. i don't get any results.
how can i solve this ?

by the way sorry about the content messing =)

thanks  for the  previous help  =)

On 17.11.2010 20:16, Digy wrote:

 1.
using System;
 2.
using System.Collections.Generic;
 3.
using System.ComponentModel;
 4.
using System.Data;
 5.
using System.Drawing;
 6.
using System.Linq;
 7.
using System.Text;
 8.
using System.Windows.Forms;
 9.
using Lucene.Net;
10.
using Lucene.Net.Analysis.Standard;
11.
using Lucene.Net.Documents;
12.
using Lucene.Net.Index;
13.
using Lucene.Net.QueryParsers;
14.
using Lucene.Net.Search;
15.
using System.IO;
16.
17.
namespace newLucene
18.
{
19.
public partial class Form1 : Form
20.
{
21.
public Form1()
22.
{
23.
 InitializeComponent();
24.
}
25.
26.
private void buttonIndex_Click(object sender, EventArgs e)
27.
{
28.
 IndexWriter indexwrtr = new
IndexWriter(@c:\index\,new StandardAnalyzer() , true);
29.
 Document doc = new Document();
30.
string filename = @fer.txt;
31.
 Lucene.Net.QueryParsers.QueryParser df;
32.
33.
34.
35.
System.IO.StreamReader local_StreamReader = new
System.IO.StreamReader(@C:\z\fer.txt);
36.
string  file_text = local_StreamReader.ReadToEnd();
37.
38.
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
39.
 doc.Add(new
Field(text,encoding.GetBytes(file_text),Field.Store.YES));
40.
 doc.Add(new
Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
41.
 doc.Add(new Field(title,
encoding.GetBytes(filename), Field.Store.YES));
42.
 indexwrtr.AddDocument(doc);
43.
44.
 indexwrtr.Optimize();
45.
 indexwrtr.Close();
46.
47.
}
48.
49.
private void buttonSearch_Click(object sender, EventArgs e)
50.
{
51.
 IndexSearcher indxsearcher = new
IndexSearcher(@C:\index\);
52.
53.
 QueryParser parser = new QueryParser(contents, new
StandardAnalyzer());
54.
 Query query = parser.Parse(textBoxQuery.Text);
55.
56.
//Lucene.Net.QueryParsers.QueryParser qp = new
QueryParser(Lucene.Net.QueryParsers.CharStream
s).Parse(textBoxQuery.Text);
57.
 Hits hits = indxsearcher.Search(query);
58.
59.
60.
for (int i = 0; i   hits.Length(); i++)
61.
{
62.
63.
 Document doc = hits.Doc(i);
64.
65.
66.
string filename = doc.Get(title);
67.
string path = doc.Get(path);
68.
string folder = Path.GetDirectoryName(path);
69.
70.
71.
 ListViewItem item = new ListViewItem(new string[]
{ null, filename, asd, hits.Score(i).ToString() });
72.
 item.Tag = path;
73.
74.
this.listViewResults.Items.Add(item);
75.
 Application.DoEvents();
76.
}
77.
78.
 indxsearcher.Close();
79.
80.
81.
82.
83.
}
84.
}
85.
}


thanks





Re: need some help =)

2010-11-17 Thread asmcad

i'll try thanks =)
On 17.11.2010 21:14, Digy wrote:

Try to see what you are indexing

http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-user/201011.mbox/%3caanlktim6kyuzhwb8p7g=hvqx6dy1fkarchro0hyw+...@mail.gmail.com%3e


And you can also think of use of ASCIIFoldingFilter if it fits to your needs.

DIGY



-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 11:06 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: need some help =)


i solved the problem . now i have non-english character problem.
when i search like something çşğuı(i'm not sure you can see this)
characters. i don't get any results.
how can i solve this ?

by the way sorry about the content messing =)

thanks  for the  previous help  =)

On 17.11.2010 20:16, Digy wrote:

 1.
using System;
 2.
using System.Collections.Generic;
 3.
using System.ComponentModel;
 4.
using System.Data;
 5.
using System.Drawing;
 6.
using System.Linq;
 7.
using System.Text;
 8.
using System.Windows.Forms;
 9.
using Lucene.Net;
10.
using Lucene.Net.Analysis.Standard;
11.
using Lucene.Net.Documents;
12.
using Lucene.Net.Index;
13.
using Lucene.Net.QueryParsers;
14.
using Lucene.Net.Search;
15.
using System.IO;
16.
17.
namespace newLucene
18.
{
19.
public partial class Form1 : Form
20.
{
21.
public Form1()
22.
{
23.
 InitializeComponent();
24.
}
25.
26.
private void buttonIndex_Click(object sender, EventArgs e)
27.
{
28.
 IndexWriter indexwrtr = new
IndexWriter(@c:\index\,new StandardAnalyzer() , true);
29.
 Document doc = new Document();
30.
string filename = @fer.txt;
31.
 Lucene.Net.QueryParsers.QueryParser df;
32.
33.
34.
35.
System.IO.StreamReader local_StreamReader = new
System.IO.StreamReader(@C:\z\fer.txt);
36.
string  file_text = local_StreamReader.ReadToEnd();
37.
38.
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
39.
 doc.Add(new
Field(text,encoding.GetBytes(file_text),Field.Store.YES));
40.
 doc.Add(new
Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
41.
 doc.Add(new Field(title,
encoding.GetBytes(filename), Field.Store.YES));
42.
 indexwrtr.AddDocument(doc);
43.
44.
 indexwrtr.Optimize();
45.
 indexwrtr.Close();
46.
47.
}
48.
49.
private void buttonSearch_Click(object sender, EventArgs e)
50.
{
51.
 IndexSearcher indxsearcher = new
IndexSearcher(@C:\index\);
52.
53.
 QueryParser parser = new QueryParser(contents, new
StandardAnalyzer());
54.
 Query query = parser.Parse(textBoxQuery.Text);
55.
56.
//Lucene.Net.QueryParsers.QueryParser qp = new
QueryParser(Lucene.Net.QueryParsers.CharStream
s).Parse(textBoxQuery.Text);
57.
 Hits hits = indxsearcher.Search(query);
58.
59.
60.
for (int i = 0; i   hits.Length(); i++)
61.
{
62.
63.
 Document doc = hits.Doc(i);
64.
65.
66.
string filename = doc.Get(title);
67.
string path = doc.Get(path);
68.
string folder = Path.GetDirectoryName(path);
69.
70.
71.
 ListViewItem item = new ListViewItem(new string[]
{ null, filename, asd, hits.Score(i).ToString() });
72.
 item.Tag = path;
73.
74.
this.listViewResults.Items.Add(item);
75.
 Application.DoEvents();
76.
}
77.
78.
 indxsearcher.Close();
79.
80.
81.
82.
83.
}
84.
}
85.
}


thanks





Re: ASF Public Mail Archives on Amazon S3

2010-11-17 Thread Grant Ingersoll
Hmmm, let me look.  I don't know if I will be able to recover it


On Nov 17, 2010, at 1:48 PM, Michael McCandless wrote:

 Grant, public_p_r.tar seems to be missing?  Is that intentional?
 Maybe some super-secret project inside there :)
 
 Mike
 
 On Thu, Oct 14, 2010 at 12:05 PM, Grant Ingersoll gsing...@apache.org wrote:
 Hi ORPers,
 
 I put up the complete ASF public mail archives as of about 3 weeks ago on 
 Amazon's S3 and have made them public (let me know if I messed up, it is the 
 first time I've done this).  I also intend, in the coming weeks, to convert 
 them into Mahout files (if anyone wants to help let me know).
 
 There are 5 files:
 https://s3.amazonaws.com/asf-mail-archives/public_a_d.tar
 https://s3.amazonaws.com/asf-mail-archives/public_e_k.tar
 https://s3.amazonaws.com/asf-mail-archives/public_l_o.tar
 https://s3.amazonaws.com/asf-mail-archives/public_s_t.tar
 https://s3.amazonaws.com/asf-mail-archives/public_u_z.tar
 
 The tarballs are organized by Top Level Project name (i.e. Mahout is in the 
 public_l_o.tar file).  The tarballs contain GZIP files by date, I believe.  
 I believe the total uncompressed file size is somewhere in the 80-100GB 
 range.  That should be sufficient to drive some semi-interesting things in 
 terms of scale, even if it is towards the smaller end of things.
 
 As the ASF has very clear public mailing list archive policies, it is my 
 belief that this data set is completely unencumbered.
 
 From an ORP standpoint, this might make for a first data set for evaluation 
 once we have the evaluator framework in place.
 
 Cheers,
 Grant
 



[jira] Commented: (LUCENE-2755) Some improvements to CMS

2010-11-17 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932864#action_12932864
 ] 

Earwin Burrfoot commented on LUCENE-2755:
-

{quote}
If we proceed w/ your proposal, that is basically the MS/ME polling MP, and not 
IW doing so, how would IW know about the running merges and pending ones? Today 
IW tracks those two lists so that if you need to abort merges, it knows which 
ones to abort.

We can workaround aborting the running merges by introducing a MS.abort()-like 
method. But what about MP? Now the lists are divided between too entities (MP 
and MS), and aborting a MP does not make sense (doable, but I don't think it 
belongs there). 
{quote}
There are no lists at all with my approach. At least no pending list, that 
one gets recalculated each time we poll MP and it never gets out, neither gets 
stored inside.
There's a kind of implicit in flight list - MS has the knowledge of its 
threads that are currently doing things. And if you want to go around aborting 
things, MS is probably the right place to do this.

bq. Maybe we can have MS.abort() poll MP for next merges until it returns null, 
and throwing all the returned ones away - that can be done.
So, just I said - that's not needed. MP is empty, it has no state.

bq. Should we, in the scope of this issue, make IW a required settable 
parameter on MS, like we do w/ MP?
For the love of God, no. I'd like to see it removed from MP too.
It's only natural to pass the same instance of Policy or Scheduler to different 
Writers, so they have the same behaviour and share Scheduler resources 
(insanely important if you have fifteen indexes like I do and don't want them 
to rape hardware with fifteen simultaneous merges).
It is against the nature to pass Writer to Policy. Does the Policy need to 
write anything on its own, when it decides to? No. It should advice, not act.

 Some improvements to CMS
 

 Key: LUCENE-2755
 URL: https://issues.apache.org/jira/browse/LUCENE-2755
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0


 While running optimize on a large index, I've noticed several things that got 
 me to read CMS code more carefully, and find these issues:
 * CMS may hold onto a merge if maxMergeCount is hit. That results in the 
 MergeThreads taking merges from the IndexWriter until they are exhausted, and 
 only then that blocked merge will run. I think it's unnecessary that that 
 merge will be blocked.
 * CMS sorts merges by segments size, doc-based and not bytes-based. Since the 
 default MP is LogByteSizeMP, and I hardly believe people care about doc-based 
 size segments anymore, I think we should switch the default impl. There are 
 two ways to make it extensible, if we want:
 ** Have an overridable member/method in CMS that you can extend and override 
 - easy.
 ** Have OneMerge be comparable and let the MP determine the order (e.g. by 
 bytes, docs, calibrate deletes etc.). Better, but will need to tap into 
 several places in the code, so more risky and complicated.
 On the go, I'd like to add some documentation to CMS - it's not very easy to 
 read and follow.
 I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr-trunk - Build # 1315 - Still Failing

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1315/

All tests passed

Build Log (for compile errors):
[...truncated 18459 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 1512 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1512/

18 tests failed.
REGRESSION:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
KeeperErrorCode = ConnectionLoss for /configs/conf1/synonyms.txt

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /configs/conf1/synonyms.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:225)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:389)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:411)
at 
org.apache.solr.cloud.AbstractZkTestCase.putConfig(AbstractZkTestCase.java:97)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:90)
at 
org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
org.apache.solr.common.cloud.ZooKeeperException: 

Stack Trace:
java.lang.RuntimeException: org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.util.TestHarness.init(TestHarness.java:152)
at org.apache.solr.util.TestHarness.init(TestHarness.java:134)
at org.apache.solr.util.TestHarness.init(TestHarness.java:124)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:247)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:110)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:98)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:64)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:530)
at 
org.apache.solr.util.TestHarness$Initializer.initialize(TestHarness.java:191)
at org.apache.solr.util.TestHarness.init(TestHarness.java:139)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /collections
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:225)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:389)
at 
org.apache.solr.cloud.ZkController.addZkShardsNode(ZkController.java:159)
at org.apache.solr.cloud.ZkController.register(ZkController.java:481)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:521)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
ERROR: SolrIndexSearcher opens=1 closes=0

Stack Trace:
junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=1 closes=0
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:128)
at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:302)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:79)


REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:441)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at 
org.apache.solr.cloud.CloudStateUpdateTest.setUp(CloudStateUpdateTest.java:112)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /collections
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:199)
at 
org.apache.solr.common.cloud.ZkStateReader.makeShardZkNodeWatches(ZkStateReader.java:184)
at 

Lucene-Solr-tests-only-trunk - Build # 1513 - Still Failing

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1513/

12 tests failed.
FAILED:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
KeeperErrorCode = ConnectionLoss for /solr

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /solr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:348)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:309)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:291)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:256)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:71)
at 
org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)


FAILED:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.cloud.ZkController.init(ZkController.java:301)
at org.apache.solr.cloud.ZkController.init(ZkController.java:133)
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:159)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:338)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at 
org.apache.solr.cloud.CloudStateUpdateTest.setUp(CloudStateUpdateTest.java:122)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /live_nodes/127.0.0.1:1662_solr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:348)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:309)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:291)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:261)
at 
org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:372)
at org.apache.solr.cloud.ZkController.init(ZkController.java:285)


FAILED:  org.apache.solr.cloud.ZkSolrClientTest.testConnect

Error Message:
Could not connect to ZooKeeper 127.0.0.1:42074/solr within 3 ms

Stack Trace:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:42074/solr within 3 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:122)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:85)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:65)
at 
org.apache.solr.cloud.ZkSolrClientTest.testConnect(ZkSolrClientTest.java:43)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)


FAILED:  
org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave

Error Message:
http://localhost:42265/solr/replication?command=disableReplication

Stack Trace:
java.io.FileNotFoundException: 
http://localhost:42265/solr/replication?command=disableReplication
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1267)
at java.net.URL.openStream(URL.java:1029)
at 
org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave(TestReplicationHandler.java:173)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)


FAILED:  
org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication

Error Message:
expected:498 but was:499

Stack Trace:

Lucene-Solr-tests-only-trunk - Build # 1514 - Still Failing

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1514/

1 tests failed.
FAILED:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
.response.numFound:35!=67

Stack Trace:
junit.framework.AssertionFailedError: .response.numFound:35!=67
at 
org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:553)
at 
org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:307)
at 
org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:127)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:562)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:881)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:847)




Build Log (for compile errors):
[...truncated 8714 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-236) Field collapsing

2010-11-17 Thread peterwang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932905#action_12932905
 ] 

peterwang commented on SOLR-236:


SOLR-236-1_4_1-paging-totals-working.patch patch failed with following errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

# diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);



 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, 
 SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment 

[jira] Reopened: (SOLR-1667) PatternTokenizer does not clearAttributes()

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reopened SOLR-1667:
---

  Assignee: Robert Muir  (was: Shalin Shekhar Mangar)

reopening to backport to solr 1.4.x branch.

 PatternTokenizer does not clearAttributes()
 ---

 Key: SOLR-1667
 URL: https://issues.apache.org/jira/browse/SOLR-1667
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 1.5, 3.1, 4.0

 Attachments: SOLR-1667.patch


 PatternTokenizer creates tokens, but never calls clearAttributes()
 because of this things like positionIncrementGap are never reset to their 
 default value.
 trivial patch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-11-17 Thread peterwang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932905#action_12932905
 ] 

peterwang edited comment on SOLR-236 at 11/17/10 6:21 AM:
--

SOLR-236-1_4_1-paging-totals-working.patch patch failed with following errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

 $ diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);



  was (Author: peterwang):
SOLR-236-1_4_1-paging-totals-working.patch patch failed with following 
errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

# diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);


  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-11-17 Thread peterwang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932905#action_12932905
 ] 

peterwang edited comment on SOLR-236 at 11/17/10 6:23 AM:
--

SOLR-236-1_4_1-paging-totals-working.patch patch failed with following errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

{code}
$ diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);
{code} 

  was (Author: peterwang):
SOLR-236-1_4_1-paging-totals-working.patch patch failed with following 
errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

 $ diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);


  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, 

[jira] Commented: (LUCENE-2764) Allow tests to use random codec per field

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932910#action_12932910
 ] 

Michael McCandless commented on LUCENE-2764:


bq. The problem is if we have IW writing field A with codec Standard then open 
a new IW with field A using PreFlexRW we get problems with the comparator if 
those segments are merged though.

Hmm this should be OK...

The PreFlexRW codec has a sneaky impersonation layer (test only)
that attempts to figure out which term comparator it's supposed to be
using when something is reading the segment.  It sounds like that
layer isn't being smart enough now.

I think we could fix it -- really it just needs to know which codec is
writing.  If it's PreFlexRW that's writing then it needs to use the
legacy sort order; else, unicode.


 Allow tests to use random codec per field
 -

 Key: LUCENE-2764
 URL: https://issues.apache.org/jira/browse/LUCENE-2764
 Project: Lucene - Java
  Issue Type: Test
  Components: Tests
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2764.patch, LUCENE-2764.patch


 Since we now have a real per field codec support we should enable to run the 
 tests with a random codec per field. When I change something related to 
 codecs internally I would like to ensure that whatever combination of codecs 
 (except of preflex) I use the code works just fine. I created a 
 RandomCodecProvider in LuceneTestCase that randomly selects the codec for 
 fields when it sees them the first time. I disabled the test by default to 
 leave the old randomize codec support in as it was / is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-11-17 Thread peterwang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932905#action_12932905
 ] 

peterwang edited comment on SOLR-236 at 11/17/10 6:28 AM:
--

SOLR-236-1_4_1-paging-totals-working.patch patch failed with following errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit SOLR-236-1_4_1.patch to produce 
SOLR-236-1_4_1-paging-totals-working.patch (delete 6 lines without fix diff 
hunk number) 
possible fix:

{code}
diff -u SOLR-236-1_4_1-paging-totals-working.patch.orig 
SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1-paging-totals-working.patch.orig 2010-11-17 
19:26:05.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
{code} 

  was (Author: peterwang):
SOLR-236-1_4_1-paging-totals-working.patch patch failed with following 
errors:

patch:  malformed patch at line 3348: Index: 
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java

seems caused by hand edit (delete 6 lines without fix diff hunk number) patch 
files, possible fix:

{code}
$ diff -u SOLR-236-1_4_1.patch SOLR-236-1_4_1-paging-totals-working.patch
--- SOLR-236-1_4_1.patch2010-11-17 18:22:25.0 +0800
+++ SOLR-236-1_4_1-paging-totals-working.patch  2010-11-17 19:17:20.0 
+0800
@@ -2834,7 +2834,7 @@
 ===
 --- 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
 +++ 
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java 
   (revision )
-@@ -0,0 +1,517 @@
+@@ -0,0 +1,511 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one or more
 + * contributor license agreements.  See the NOTICE file distributed with
@@ -2939,12 +2939,6 @@
 +collapseDoc = new NonAdjacentCollapseGroup(0, 0, documentComparator, 
collapseThreshold, currentValue);
 +collapsedDocs.put(currentValue, collapseDoc);
 +collapsedGroupPriority.add(collapseDoc);
-+
-+if (collapsedGroupPriority.size()  maxNumberOfGroups) {
-+  NonAdjacentCollapseGroup inferiorGroup = 
collapsedGroupPriority.first();
-+  collapsedDocs.remove(inferiorGroup.fieldValue);
-+  collapsedGroupPriority.remove(inferiorGroup);
-+}
 +  }
 +  // dropoutId has a value smaller than the smallest value in the queue 
and therefore it was removed from the queue
 +  Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);
{code} 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, 
 SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 

[jira] Resolved: (SOLR-1667) PatternTokenizer does not clearAttributes()

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1667.
---

   Resolution: Fixed
Fix Version/s: (was: 1.5)
   1.4.2

Committed revision 1035982.

 PatternTokenizer does not clearAttributes()
 ---

 Key: SOLR-1667
 URL: https://issues.apache.org/jira/browse/SOLR-1667
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 1.4.2, 3.1, 4.0

 Attachments: SOLR-1667.patch


 PatternTokenizer creates tokens, but never calls clearAttributes()
 because of this things like positionIncrementGap are never reset to their 
 default value.
 trivial patch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932915#action_12932915
 ] 

Michael McCandless commented on LUCENE-2680:


Why do we still have deletesFlushed?  And why do we still need to
remap docIDs on merge?  I thought with this new approach the docIDUpto
for each buffered delete Term/Query would be a local docID to that
segment?

On flush the deletesInRAM should be carried directly over to the
segmentDeletes, and there shouldn't be a deletesFlushed?

A few other small things:

  * You can use SegmentInfos.clone to copy the segment infos? (it
makes a deep copy)

  * SegmentDeletes.clearAll() need not iterate through the
terms/queries to subtract the RAM used?  Ie just multiply by
.size() instead and make one call to deduct RAM used?

  * The SegmentDeletes use less than BYTES_PER_DEL_TERM because it's a
simple HashSet not a HashMap?  Ie we are over-counting RAM used
now?  (Same for by query)

  * Can we store segment's deletes elsewhere?  The SegmentInfo should
be a lightweight class... eg it's used by DirectoryReader to read
the index, and if it's read only DirectoryReader there's no need
for it to allocate the SegmentDeletes?  These data structures
should only be held by IndexWriter/DocumentsWriter.

  * Do we really need to track appliedTerms/appliedQueries?  Ie is
this just an optimization so that if the caller deletes by the
Term/Query again we know to skip it?  Seems unnecessary if that's
all...


 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2765) Optimize scanning in DocsEnum

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2765:
---

Assignee: Robert Muir

 Optimize scanning in DocsEnum
 -

 Key: LUCENE-2765
 URL: https://issues.apache.org/jira/browse/LUCENE-2765
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2765.patch, LUCENE-2765.patch


 Similar to LUCENE-2761:
 when we call advance(), after skipping it scans, but this can be optimized 
 better than calling nextDoc() like today
 {noformat}
   // scan for the rest:
   do {
 nextDoc();
   } while (target  doc);
 {noformat}
 in particular, the freq can be skipVinted and the skipDocs (deletedDocs) 
 don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum

2010-11-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932920#action_12932920
 ] 

Robert Muir commented on LUCENE-2765:
-

here is Mike's results on his wikipedia index (multi-segment, 5% deletions) 
with the patch.

||Query||QPS base||QPS spec||Pct diff||
|unit state|7.94|7.84|-1.3%|
|state|36.15|35.81|-1.0%|
|spanNear([unit, state], 10, true)|4.46|4.42|-0.9%|
|spanFirst(unit, 5)|16.51|16.45|-0.4%|
|unit state|10.76|10.78|0.1%|
|unit~2.0|13.83|14.06 |1.7%|
|unit~1.0|14.36|14.69 |2.3%|
|uni*|15.57|16.02|2.9%|
|unit*|27.29|28.26|3.5%|
|+unit +state|11.73|12.31|4.9%|
|united~1.0|29.01|30.86|6.4%|
|un*d|66.52|70.99|6.7%|
|u*d|21.29|22.98|7.9%|
|united~2.0|6.48|7.07|9.1%|
|+nebraska +state|169.87|188.95|11.2%|

 Optimize scanning in DocsEnum
 -

 Key: LUCENE-2765
 URL: https://issues.apache.org/jira/browse/LUCENE-2765
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2765.patch, LUCENE-2765.patch


 Similar to LUCENE-2761:
 when we call advance(), after skipping it scans, but this can be optimized 
 better than calling nextDoc() like today
 {noformat}
   // scan for the rest:
   do {
 nextDoc();
   } while (target  doc);
 {noformat}
 in particular, the freq can be skipVinted and the skipDocs (deletedDocs) 
 don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Dismax Wiki page

2010-11-17 Thread Erick Erickson
I was looking at a question on the users list, and there are a couple of
issues...

I'm running 1.4.1 on a Windows box. Trying to specify dismax via
defType=dismax fails, returning 0 results and doesn't look like it hits the
dismax handler at all, at least the parsed query comes back with +() +()
with debugQuery=on.

deftype=dismax is fine. qt=dismax is also fine.

The Wiki page has qt=defType=dismax in one of the examples (
http://wiki.apache.org/solr/DisMaxQParserPlugin). and the rest of the
examples have defType.

Before I fix the Wiki page, what's the preferred syntax? I thought it was
def[T|t]ype And is the capitalization thing really a problem or not?

Thanks
Erick


[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932945#action_12932945
 ] 

Michael McCandless commented on LUCENE-2680:


Also: why are we tracking the last segment info/index?  Ie, this should only be 
necessary on cutover to DWPT right?  Because effectively today we have only a 
single DWPT?

 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2237) add factory for stempel polish stemmer

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2237.
---

Resolution: Fixed

Committed revision 1035996, 1036035 (3x)

 add factory for stempel polish stemmer
 --

 Key: SOLR-2237
 URL: https://issues.apache.org/jira/browse/SOLR-2237
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2237.patch


 Some users have asked how to enable polish stemming:
 http://www.lucidimagination.com/search/document/2581073d836cec9a/how_to_use_polish_stemmer_stempel_in_schema_xml#c67acf3dddba1164
 http://www.lucidimagination.com/search/document/d115f17bd69a4dae/polish_stemmer#d115f17bd69a4dae
 http://www.lucidimagination.com/search/document/137d010682bb7367/polish_language_support
 etc.
 We should add the factory to make this easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Michael McCandless (JIRA)
add infrastructure for longer running nightly test cases


 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0


I'm spinning this out of LUCENE-2762...

The patch there adds initial infrastructure for tests to pull documents from a 
line file, and adds a longish running test case using that line file to test 
NRT.

I'd like to see some tests run on more substantial indices based on real 
data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2768:
---

Attachment: LUCENE-2768.patch

Patch.

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Grant Ingersoll

On Nov 17, 2010, at 1:01 AM, Steven A Rowe wrote:

 After looking more closely at the vanilla Subversion version of the mailer.py 
 script, I'm 99% sure that removing propchange from the generate_diffs list 
 will have zero effect, but I'd love to be proven wrong.
 
 Turns out Subversion's mailer.py once had a much larger set of property 
 filtering options, but C. Mike (Pilato) thought the option set was too 
 baroque, so he reverted the entire set - see 
 http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
 I've asked on #svn about re-instating the ignore_props and 
 ignore_propdiffs regex-valued options - these would allow us to only ignore 
 svn:mergeinfo diffs while still noting that files' properties have changed, 
 without affecting other properties or their diffs.  No responses yet, 
 hopefully tomorrow.

We might also be able to supply a patch to ASF infra to turn it on.

However, I still feel like we are doing something wrong here relative to other 
projects.  Surely other projects are doing merges and have successfully avoided 
all this noise.  (I know, I know, we've discussed this before.)  Perhaps an 
email to commun...@a.o might help or if we look around at other projects that 
merge and see what they do.  In looking in the mailer conf, some projects turn 
off generating diffs altogether, but I don't think that is what we want.

FWIW, it was announced at ApacheCon that the ASF will be supporting Read/Write 
Git, so maybe we just live with it until we can migrate to Git.

-Grant

 
 Steve
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, November 16, 2010 9:55 AM
 To: dev@lucene.apache.org
 Subject: mergeinfo commit mails, possible solution
 
 From #lucene IRC:
 gsingers:sarowe and I were talking about the mergeinfo commit overload
 [09:43]gsingers:and the asf_mailer.conf file
 [09:43]gsingers:In looking at the file
 [09:44]gsingers:it appears the one thing we have the ability to do is to
 turn off the generation of diffs for
 [09:44]gsingers:events
 [09:44]gsingers:The default setting is:
 [09:44]gsingers:generate_diffs = add copy modify propchange
 [09:44]gsingers:sarowe and I are proposing to change our settings to just
 be add/copy/modify
 [09:44]gsingers:and try dropping propchange
 [09:45]gsingers:I honestly don't know whether it will work or not
 [09:45]gsingers:and it will also likely mean we will miss notifications of
 other propchanges
 [09:45]gsingers:We've asked on #asfinfra if there are other options
 [09:45]gsingers:and sarowe is looking into the mailer.py script to see if
 there are other things available
 [09:46]gsingers:I guess the question here is, do people want to try
 turning off propchange?
 
 
 -Grant
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932956#action_12932956
 ] 

Robert Muir commented on LUCENE-2768:
-

in revision 1036038 i set -Dtests.nightly=1 for running tests during hudson 
nightly,
but i didnt set it for the clover portion... i think it would only cause the 
nightly job to take an eternity

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Robert Muir
On Wed, Nov 17, 2010 at 8:53 AM, Grant Ingersoll gsing...@apache.org wrote:

 FWIW, it was announced at ApacheCon that the ASF will be supporting 
 Read/Write Git, so maybe we just live with it until we can migrate to Git.


I didn't know there was consensus that our project would migrate to
Git. I surely hope we would vote on such a decision!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: mergeinfo commit mails, possible solution

2010-11-17 Thread Uwe Schindler
The ASF does not use the vanilla mailer.py script, they are using 
http://opensource.perlig.de/svnmailer - and this one does fantastic work 
regarding this! We just need to change the config files of this tool and 
specify a special subtree config for /lucene project folder.

See also the attached mail!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Steven A Rowe [mailto:sar...@syr.edu]
 Sent: Wednesday, November 17, 2010 7:02 AM
 To: dev@lucene.apache.org
 Subject: RE: mergeinfo commit mails, possible solution
 
 After looking more closely at the vanilla Subversion version of the mailer.py
 script, I'm 99% sure that removing propchange from the generate_diffs list 
 will
 have zero effect, but I'd love to be proven wrong.
 
 Turns out Subversion's mailer.py once had a much larger set of property
 filtering options, but C. Mike (Pilato) thought the option set was too 
 baroque, so
 he reverted the entire set - see
 http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
 I've asked on #svn about re-instating the ignore_props and 
 ignore_propdiffs
 regex-valued options - these would allow us to only ignore svn:mergeinfo diffs
 while still noting that files' properties have changed, without affecting 
 other
 properties or their diffs.  No responses yet, hopefully tomorrow.
 
 Steve
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Tuesday, November 16, 2010 9:55 AM
  To: dev@lucene.apache.org
  Subject: mergeinfo commit mails, possible solution
 
  From #lucene IRC:
  gsingers:sarowe and I were talking about the mergeinfo commit overload
  [09:43]gsingers:and the asf_mailer.conf file [09:43]gsingers:In
  looking at the file [09:44]gsingers:it appears the one thing we have
  the ability to do is to turn off the generation of diffs for
  [09:44]gsingers:events [09:44]gsingers:The default setting is:
  [09:44]gsingers:generate_diffs = add copy modify propchange
  [09:44]gsingers:sarowe and I are proposing to change our settings to
  just be add/copy/modify [09:44]gsingers:and try dropping propchange
  [09:45]gsingers:I honestly don't know whether it will work or not
  [09:45]gsingers:and it will also likely mean we will miss
  notifications of other propchanges [09:45]gsingers:We've asked on
  #asfinfra if there are other options [09:45]gsingers:and sarowe is
  looking into the mailer.py script to see if there are other things
  available [09:46]gsingers:I guess the question here is, do people want
  to try turning off propchange?
 
 
  -Grant
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org

---BeginMessage---
Hi Upayavira,

Thanks for the hint. Indeed with changing the config file (which allows
special configs for specific subtrees of the svn, so we can do it only for
Lucene), we can do it very easy:

http://opensource.perlig.de/svnmailer/doc-1.0/#groups-generate-diffs

The generate_diffs option defines which actions diffs are generated for. It
takes a space or tab separated list of one or more of the following tokens:
add, modify, copy, delete, propchange and none.

If the add token is given and a new file is added to the repository, the
svnmailer generates a diff between an empty file and the newly added one. If
the modify token is given and the content of an already existing file is
changed, a diff between the old revision and the new revision of that file
is generated. The copy token only worries about files, that are copied and
modified during one commit. The delete token generates a diff between the
previous revision of the file and an empty file, if a file was deleted.

If the propchange token is given, the svnmailer also takes care of changes
in versioned properties. Whether it should actually generate diffs for the
property change action depends on the other tokens of the generate_diffs
list. The same rules as for files apply, except that the svnmailer never
generates property diffs for deleted files


If we change that config option and remove propchange, then the diffs would
not contain propchanges anymore. It would it only list as modified files,
but with that we can live.

Grant: Can you send me a copy of the current config file of that tool? I
could create a patch! (I am allowed to see it).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk]
 Sent: Tuesday, October 19, 2010 1:07 PM
 To: dev@lucene.apache.org
 Subject: Re: possible to filter the output to commits@ list
 
 FWIW, the commit notices are just an SVN post-commit hook that uses the
svn-
 mailer tool [http://opensource.perlig.de/svnmailer/]. I believe Grant has
 commit rights to that file - it is in the infra SVN 

Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Grant Ingersoll

On Nov 17, 2010, at 9:02 AM, Robert Muir wrote:

 On Wed, Nov 17, 2010 at 8:53 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 FWIW, it was announced at ApacheCon that the ASF will be supporting 
 Read/Write Git, so maybe we just live with it until we can migrate to Git.
 
 
 I didn't know there was consensus that our project would migrate to
 Git. I surely hope we would vote on such a decision!
 

Yeah, sorry for the implication that we would move.  It is definitely something 
we should decide together.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932957#action_12932957
 ] 

Robert Muir commented on LUCENE-2768:
-

ok, i have two potential solutions, and no particular preference as to which we 
do:

# we upgrade our Junit from 4.7 to 4.8 and use the Category support.
in this case you would use @IncludeCategory(Nightly.class) to annotate your 
test.
http://kentbeck.github.com/junit/doc/ReleaseNotes4.8.html
# we add our own annotation (e.g. @Nightly) and use that.

in either case we hack our runner to respect it, so its the same amount of work
(junit 4.8 won't actually save us anything since we won't use its 
@RunWith(Categories.class), but our own runner), its just about syntax and
possibly if we care about consistency with junit or envision other optional 
categories beyond nightly.


 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Grant Ingersoll
OK, I will set it to not do propchange.

-Grant

On Nov 17, 2010, at 9:03 AM, Uwe Schindler wrote:

 The ASF does not use the vanilla mailer.py script, they are using 
 http://opensource.perlig.de/svnmailer - and this one does fantastic work 
 regarding this! We just need to change the config files of this tool and 
 specify a special subtree config for /lucene project folder.
 
 See also the attached mail!
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 -Original Message-
 From: Steven A Rowe [mailto:sar...@syr.edu]
 Sent: Wednesday, November 17, 2010 7:02 AM
 To: dev@lucene.apache.org
 Subject: RE: mergeinfo commit mails, possible solution
 
 After looking more closely at the vanilla Subversion version of the mailer.py
 script, I'm 99% sure that removing propchange from the generate_diffs list 
 will
 have zero effect, but I'd love to be proven wrong.
 
 Turns out Subversion's mailer.py once had a much larger set of property
 filtering options, but C. Mike (Pilato) thought the option set was too 
 baroque, so
 he reverted the entire set - see
 http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
 I've asked on #svn about re-instating the ignore_props and 
 ignore_propdiffs
 regex-valued options - these would allow us to only ignore svn:mergeinfo 
 diffs
 while still noting that files' properties have changed, without affecting 
 other
 properties or their diffs.  No responses yet, hopefully tomorrow.
 
 Steve
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, November 16, 2010 9:55 AM
 To: dev@lucene.apache.org
 Subject: mergeinfo commit mails, possible solution
 
 From #lucene IRC:
 gsingers:sarowe and I were talking about the mergeinfo commit overload
 [09:43]gsingers:and the asf_mailer.conf file [09:43]gsingers:In
 looking at the file [09:44]gsingers:it appears the one thing we have
 the ability to do is to turn off the generation of diffs for
 [09:44]gsingers:events [09:44]gsingers:The default setting is:
 [09:44]gsingers:generate_diffs = add copy modify propchange
 [09:44]gsingers:sarowe and I are proposing to change our settings to
 just be add/copy/modify [09:44]gsingers:and try dropping propchange
 [09:45]gsingers:I honestly don't know whether it will work or not
 [09:45]gsingers:and it will also likely mean we will miss
 notifications of other propchanges [09:45]gsingers:We've asked on
 #asfinfra if there are other options [09:45]gsingers:and sarowe is
 looking into the mailer.py script to see if there are other things
 available [09:46]gsingers:I guess the question here is, do people want
 to try turning off propchange?
 
 
 -Grant
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
 additional commands, e-mail: dev-h...@lucene.apache.org
 
 Mail Attachment.eml
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Grant Ingersoll
OK, it is now set.  Next time someone does a merge, keep an eye on the commit 
messages and let me know.

I set it to add copy modify


-Grant

On Nov 17, 2010, at 9:03 AM, Uwe Schindler wrote:

 The ASF does not use the vanilla mailer.py script, they are using 
 http://opensource.perlig.de/svnmailer - and this one does fantastic work 
 regarding this! We just need to change the config files of this tool and 
 specify a special subtree config for /lucene project folder.
 
 See also the attached mail!
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 -Original Message-
 From: Steven A Rowe [mailto:sar...@syr.edu]
 Sent: Wednesday, November 17, 2010 7:02 AM
 To: dev@lucene.apache.org
 Subject: RE: mergeinfo commit mails, possible solution
 
 After looking more closely at the vanilla Subversion version of the mailer.py
 script, I'm 99% sure that removing propchange from the generate_diffs list 
 will
 have zero effect, but I'd love to be proven wrong.
 
 Turns out Subversion's mailer.py once had a much larger set of property
 filtering options, but C. Mike (Pilato) thought the option set was too 
 baroque, so
 he reverted the entire set - see
 http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
 I've asked on #svn about re-instating the ignore_props and 
 ignore_propdiffs
 regex-valued options - these would allow us to only ignore svn:mergeinfo 
 diffs
 while still noting that files' properties have changed, without affecting 
 other
 properties or their diffs.  No responses yet, hopefully tomorrow.
 
 Steve
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, November 16, 2010 9:55 AM
 To: dev@lucene.apache.org
 Subject: mergeinfo commit mails, possible solution
 
 From #lucene IRC:
 gsingers:sarowe and I were talking about the mergeinfo commit overload
 [09:43]gsingers:and the asf_mailer.conf file [09:43]gsingers:In
 looking at the file [09:44]gsingers:it appears the one thing we have
 the ability to do is to turn off the generation of diffs for
 [09:44]gsingers:events [09:44]gsingers:The default setting is:
 [09:44]gsingers:generate_diffs = add copy modify propchange
 [09:44]gsingers:sarowe and I are proposing to change our settings to
 just be add/copy/modify [09:44]gsingers:and try dropping propchange
 [09:45]gsingers:I honestly don't know whether it will work or not
 [09:45]gsingers:and it will also likely mean we will miss
 notifications of other propchanges [09:45]gsingers:We've asked on
 #asfinfra if there are other options [09:45]gsingers:and sarowe is
 looking into the mailer.py script to see if there are other things
 available [09:46]gsingers:I guess the question here is, do people want
 to try turning off propchange?
 
 
 -Grant
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
 additional commands, e-mail: dev-h...@lucene.apache.org
 
 Mail Attachment.eml
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932972#action_12932972
 ] 

Uwe Schindler commented on LUCENE-2768:
---

bq. in revision 1036038 i set -Dtests.nightly=1 for running tests during hudson 
nightly, but i didnt set it for the clover portion... i think it would only 
cause the nightly job to take an eternity 

+1, we also have no tests.multiplier for clover!

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: mergeinfo commit mails, possible solution

2010-11-17 Thread Uwe Schindler
Who does the first merge? *g*
Thanks Grant for taking care, I just did not take care the last weeks!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Wednesday, November 17, 2010 3:41 PM
 To: dev@lucene.apache.org
 Subject: Re: mergeinfo commit mails, possible solution
 
 OK, it is now set.  Next time someone does a merge, keep an eye on the
 commit messages and let me know.
 
 I set it to add copy modify
 
 
 -Grant
 
 On Nov 17, 2010, at 9:03 AM, Uwe Schindler wrote:
 
  The ASF does not use the vanilla mailer.py script, they are using
 http://opensource.perlig.de/svnmailer - and this one does fantastic work
 regarding this! We just need to change the config files of this tool and
specify a
 special subtree config for /lucene project folder.
 
  See also the attached mail!
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
  -Original Message-
  From: Steven A Rowe [mailto:sar...@syr.edu]
  Sent: Wednesday, November 17, 2010 7:02 AM
  To: dev@lucene.apache.org
  Subject: RE: mergeinfo commit mails, possible solution
 
  After looking more closely at the vanilla Subversion version of the
  mailer.py script, I'm 99% sure that removing propchange from the
  generate_diffs list will have zero effect, but I'd love to be proven
wrong.
 
  Turns out Subversion's mailer.py once had a much larger set of
  property filtering options, but C. Mike (Pilato) thought the option
  set was too baroque, so he reverted the entire set - see
  http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
  I've asked on #svn about re-instating the ignore_props and
 ignore_propdiffs
  regex-valued options - these would allow us to only ignore
  svn:mergeinfo diffs while still noting that files' properties have
  changed, without affecting other properties or their diffs.  No
responses yet,
 hopefully tomorrow.
 
  Steve
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Tuesday, November 16, 2010 9:55 AM
  To: dev@lucene.apache.org
  Subject: mergeinfo commit mails, possible solution
 
  From #lucene IRC:
  gsingers:sarowe and I were talking about the mergeinfo commit
  overload [09:43]gsingers:and the asf_mailer.conf file
  [09:43]gsingers:In looking at the file [09:44]gsingers:it appears
  the one thing we have the ability to do is to turn off the
  generation of diffs for [09:44]gsingers:events [09:44]gsingers:The
default
 setting is:
  [09:44]gsingers:generate_diffs = add copy modify propchange
  [09:44]gsingers:sarowe and I are proposing to change our settings to
  just be add/copy/modify [09:44]gsingers:and try dropping propchange
  [09:45]gsingers:I honestly don't know whether it will work or not
  [09:45]gsingers:and it will also likely mean we will miss
  notifications of other propchanges [09:45]gsingers:We've asked on
  #asfinfra if there are other options [09:45]gsingers:and sarowe is
  looking into the mailer.py script to see if there are other things
  available [09:46]gsingers:I guess the question here is, do people
  want to try turning off propchange?
 
 
  -Grant
 
 
 
  
  - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
  Mail Attachment.eml
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Integrate Lucin Search

2010-11-17 Thread Wyatt Barnett
Lucene can search anything you feed to it. But it doesn't know how to eat
itself. So:

1) You need to load the data from the database and feed it to lucene in the
format you want it indexed in. Along with the searchable data, you need to
provide an identifier you can use to link back to whatever it is you are
trying to index.
2) You will need to write some code to grab the data from the outside
links and index it. The identifier in this case could be the url you are
indexing.

I'd recommend reading Lucene in Action (http://www.manning.com/hatcher3/) to
get an idea of how the library thinks. The examples are in java, but the
concepts translate directly.

On Wed, Nov 17, 2010 at 5:43 AM, Rahul Aneja
rahula.innovami...@gmail.comwrote:





 [image: InnovaMinds]

 *
 *



 Hello,



 I have read a lot of about Lucene/Solr search,  its seems to be very
 interesting , I want to Integrate in our Application, that is built on
 ASP.Net(C#).

 Also I got some of the code from these are the links Below. But I can’t
 able to get the particular steps or code provided from these links to
 integrate the Lucene/Solr Search.

 I am using Lucene.net Library file in which the function
 (Indexer,Parser,Search) are defined.



 http://aspcode.net/c-and-lucene-to-index-and-search

 *
 http://www.logiclabz.com/c/search-lucene-index-in-net-c-with-sorting-options.aspx
 *

 *
 http://www.theplancollection.com/house-plan-related-articles/search-using-asp-net-and-Lucene
 *

 http://www.codeproject.com/KB/library/IntroducingLucene.aspx



 Firstly, I want to clarify these steps:

 1.  How to communicate with Database, by which data and Links
 corresponds to search can be get

 2.  Is lucene search also provided the Search to the outside(External
 Links) or within an application it can works?



 I also want to clarify more steps but first our main problem is the points
 that I have mention above ,



  Please Reply me solution ASAP.





 Regards,
 Rahul Aneja

 Software Developer

 [image: Phone] INDIA: +91-172-434-6890

  www.InnovaMinds.com http://www.innovaminds.com/





[jira] Commented: (LUCENE-2755) Some improvements to CMS

2010-11-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932982#action_12932982
 ] 

Shai Erera commented on LUCENE-2755:


Earwin, the way CMS currently handles the writer instance makes it entirely not 
thread-safe. If you e.g. pass different writers to merge(), the class member 
changes, and MTs will start merging other segments, and in the worse case 
attempt to merge segments of a different writer.

I myself thinks it's ok to have a MP and MS per writer, but I don't have too 
strong feelings for/against it - so if we want to allow this, we should fix CMS.

As for the other comments, I'll need to check more closely what IW does w/ 
those merges - as it checks all sorts of things (e.g. whether it's an optimize 
merge or not, see one of the latest bugs Mike resolved). So getting it entirely 
outside of IndexWriter and into MP/MS is risky - at least, I don't understand 
the code well enough (yet) to say whether it's doable at all and if we don't 
miss something.

 Some improvements to CMS
 

 Key: LUCENE-2755
 URL: https://issues.apache.org/jira/browse/LUCENE-2755
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0


 While running optimize on a large index, I've noticed several things that got 
 me to read CMS code more carefully, and find these issues:
 * CMS may hold onto a merge if maxMergeCount is hit. That results in the 
 MergeThreads taking merges from the IndexWriter until they are exhausted, and 
 only then that blocked merge will run. I think it's unnecessary that that 
 merge will be blocked.
 * CMS sorts merges by segments size, doc-based and not bytes-based. Since the 
 default MP is LogByteSizeMP, and I hardly believe people care about doc-based 
 size segments anymore, I think we should switch the default impl. There are 
 two ways to make it extensible, if we want:
 ** Have an overridable member/method in CMS that you can extend and override 
 - easy.
 ** Have OneMerge be comparable and let the MP determine the order (e.g. by 
 bytes, docs, calibrate deletes etc.). Better, but will need to tap into 
 several places in the code, so more risky and complicated.
 On the go, I'd like to add some documentation to CMS - it's not very easy to 
 read and follow.
 I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2768:


Attachment: LUCENE-2768_nightly.patch

patch that adds support for annotating tests with @Nightly.

you can also annotate a whole class with this (in that case, import it from 
LuceneTestCase).

the only trick is that junit always requires a class to have at least one 
runnable method, or it throws an exception.

in this special case that all methods or the whole class are somehow @Nightly, 
we add a fake @Ignored method so we get tests run: 0 and the NOTE instead of 
this exception.


 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch, LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



mergeinfo commit mails, possible solution

2010-11-17 Thread Grant Ingersoll
From #lucene IRC:
gsingers:sarowe and I were talking about the mergeinfo commit overload
[09:43]gsingers:and the asf_mailer.conf file
[09:43]gsingers:In looking at the file
[09:44]gsingers:it appears the one thing we have the ability to do is to turn 
off the generation of diffs for
[09:44]gsingers:events
[09:44]gsingers:The default setting is:
[09:44]gsingers:generate_diffs = add copy modify propchange
[09:44]gsingers:sarowe and I are proposing to change our settings to just be 
add/copy/modify
[09:44]gsingers:and try dropping propchange
[09:45]gsingers:I honestly don't know whether it will work or not
[09:45]gsingers:and it will also likely mean we will miss notifications of 
other propchanges
[09:45]gsingers:We've asked on #asfinfra if there are other options
[09:45]gsingers:and sarowe is looking into the mailer.py script to see if there 
are other things available
[09:46]gsingers:I guess the question here is, do people want to try turning off 
propchange?


-Grant



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932988#action_12932988
 ] 

Jason Rutherglen commented on LUCENE-2680:
--

{quote}Why do we still have deletesFlushed? And why do we still need to
remap docIDs on merge? I thought with this new approach the docIDUpto for
each buffered delete Term/Query would be a local docID to that
segment?{quote}

Deletes flushed can be removed if we store the docid-upto per segment.
Then we'll go back to having a hash map of deletes. 

{quote}The SegmentDeletes use less than BYTES_PER_DEL_TERM because it's a
simple HashSet not a HashMap? Ie we are over-counting RAM used now? (Same
for by query){quote}

Intuitively, yes, however here's the constructor of hash set:

{code} public HashSet() { map = new HashMapE,Object(); } {code}

bq. why are we tracking the last segment info/index?

I thought last segment was supposed to be used to mark the last segment of
a commit/flush. This way we save on the hash(set,map) space on the
segments upto the last segment when the commit occurred.

{quote}Can we store segment's deletes elsewhere?{quote}

We can, however I had to minimize places in the code that were potentially
causing errors (trying to reduce the problem set, which helped locate the
intermittent exceptions), syncing segment infos with the per-segment
deletes was one was one of those places. That and I thought it'd be worth
a try simplify (at the expense of breaking the unstated intention of the
SI class).

{quote}Do we really need to track appliedTerms/appliedQueries? Ie is this
just an optimization so that if the caller deletes by the Term/Query again
we know to skip it? {quote}

Yes to the 2nd question. Why would we want to try deleting multiple times?
The cost is the terms dictionary lookup which you're saying is in the
noise? I think potentially cracking open a query again could be costly in
cases where the query is indeed expensive.

{quote}not iterate through the terms/queries to subtract the RAM
used?{quote}

Well, the RAM usage tracking can't be completely defined until we finish
how we're storing the terms/queries. 

 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932990#action_12932990
 ] 

Robert Muir commented on LUCENE-2768:
-

Here is the output when tests.nightly is disabled (default) and a method or 
class is @Nightly, respectively:

{noformat}
[junit] Testsuite: org.apache.lucene.TestDemo
[junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.17 sec
[junit]
[junit] - Standard Error -
[junit] NOTE: Ignoring nightly-only test method 'testDemo'
[junit] -  ---
{noformat}

{noformat}
[junit] Testsuite: org.apache.lucene.TestDemo
[junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.171 sec
[junit]
[junit] - Standard Error -
[junit] NOTE: Ignoring nightly-only test class 'TestDemo'
[junit] -  ---
{noformat}


 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch, LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932997#action_12932997
 ] 

Uwe Schindler commented on LUCENE-2768:
---

Looks good, the hack is a hack *lol* but should work and lead to no problems.

I would only change the sysprop and static var to a Boolean and add a 
RuntimeException to the empty catch block in the reflection part.

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch, LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2768:


Attachment: LUCENE-2768_nightly.patch

here is an updated patch with Uwe's suggestions,
additionally i made the fake method final.

I'll commit this soon, then Mike can setup his test to use it.

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch, LUCENE-2768_nightly.patch, 
 LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2768) add infrastructure for longer running nightly test cases

2010-11-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933007#action_12933007
 ] 

Robert Muir commented on LUCENE-2768:
-

ok I committed the lucenetestcase/ant support in revision 1036088, 1036094 (3x)

To make nightly-only tests, annotate the methods with @Nightly.
to run tests including nightly-only tests, use -Dtests.nightly=true


 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2768.patch, LUCENE-2768_nightly.patch, 
 LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2240) Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil (JIRA)
Basic authentication for stream.url
---

 Key: SOLR-2240
 URL: https://issues.apache.org/jira/browse/SOLR-2240
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Jayendra Patil
Priority: Minor


We intend to use stream.url for indexing documents from remote locations 
exposed through http.
However, the remote urls are secured and would need basic authentication to be 
able access the documents.
The current implementation for stream.url in ContentStreamBase.URLStream does 
not support authentication.

The implementation with stream.file would mean to download the files to a local 
box and would cause duplicity, whereas stream.body would have indexing 
performance issues with the hugh data being transferred over the network.

An approach would be :-
1. Passing additional authentication parameter e.g. stream.url.auth with the 
encoded authentication value - SolrRequestParsers
2. Setting Authorization request property for the Connection - 
ContentStreamBase.URLStream
this.conn.setRequestProperty(Authorization, Basic  + 
encodedauthentication);

Any thoughts ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2762) Don't leak deleted open file handles with pooled readers

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933051#action_12933051
 ] 

Michael McCandless commented on LUCENE-2762:



So with this patch, we now build the CFS for a merged segment before
adding that segment to the segment infos.

This is important, to prevent an NRT reader from opening the pre-CFS
version, thus tying open the files, using up extra disk space, and
leaking deleted-but-open files even once all NRT readers are closed.

But, unfortunately, this means the worst-case temporary peak free disk
space required when using CFS has gone up... this worst case is hit if
you 1) open an existing index, 2) call optimize on it, 3) the index
needs more than 1 merge to become optimized, and 4) on the final merge
of that optimize just after it's built the CFS but hasn't yet
committed it to the segment infos.  At that point you have 1X due to
starting segments (which cannot be deleted until commit), another 1X
due to the segments created by the prior merge (now being merged),
another 1X by the newly merged single segment, and a final 1X from the
final CFS.  In this worst case that means we require 3X of your index
size in temporary space.

In other cases we use less disk space (the NRT case).

And of course if CFS is off there's no change to the temp disk space.

I've noted this in the javadocs and will add to CHANGES...

But... I think we should improve our default MP.  First, maybe we
should set a maxMergeMB by default?  Because immense merges cause all
sorts of problems, and, likely are not going to impact search perf.
Second, I think if a newly merged segment will be more than X% of the
index, I think we should leave it in non-compound-file format even if
useCompoundFile is enabled... I think there's a separate issue open
somewhere for that 2nd one.


 Don't leak deleted open file handles with pooled readers
 

 Key: LUCENE-2762
 URL: https://issues.apache.org/jira/browse/LUCENE-2762
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2762.patch


 If you have CFS enabled today, and pooling is enabled (either directly
 or because you've pulled an NRT reader), IndexWriter will hold open
 SegmentReaders against the non-CFS format of each merged segment.
 So even if you close all NRT readers you've pulled from the writer,
 you'll still see file handles open against files that have been
 deleted.
 This count will not grow unbounded, since it's limited by the number
 of segments in the index, but it's still a serious problem since the
 app had turned off CFS in the first place presumably to avoid risk of
 too-many-open-files.  It's also bad because it ties up disk space
 since these files would otherwise be deleted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 1530 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1530/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
expected:2 but was:3

Stack Trace:
junit.framework.AssertionFailedError: expected:2 but was:3
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:923)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:861)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208)




Build Log (for compile errors):
[...truncated 8769 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2241) Upgrade to Tika 0.8

2010-11-17 Thread Grant Ingersoll (JIRA)
Upgrade to Tika 0.8
---

 Key: SOLR-2241
 URL: https://issues.apache.org/jira/browse/SOLR-2241
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 3.1, 4.0


as the title says

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933067#action_12933067
 ] 

Michael McCandless commented on LUCENE-2680:



{quote}
Deletes flushed can be removed if we store the docid-upto per segment.
Then we'll go back to having a hash map of deletes.
{quote}

I think we should do this?

Ie, each flushed segment stores the map of del Term/Query to
docid-upto, where that docid-upto is private to the segment (no
remapping on merges needed).

When it's time to apply deletes to about-to-be-merged segments, we
must apply all future segments deletions unconditionally to each
segment, and then conditionally (respecting the local docid-upto)
apply that segment's deletions.

{quote}
Intuitively, yes, however here's the constructor of hash set:

{noformat}
public HashSet() { map = new HashMapE,Object(); }
{noformat}
{quote}

Ugh I forgot about that.  Is that still true?  That's awful.

{quote}
bq. why are we tracking the last segment info/index?

I thought last segment was supposed to be used to mark the last segment of
a commit/flush. This way we save on the hash(set,map) space on the
segments upto the last segment when the commit occurred.
{quote}

Hmm... I think lastSegment was needed only for the multiple DWPT
case, to record the last segment already flushed in the index as of
when that DWPT was created.  This is so we know going back when we
can start unconditionally apply the buffered delete term.

With the single DWPT we effectively have today isn't last segment
always going to be what we just flushed?  (Or null if we haven't yet
done a flush in the current session).

{quote}
bq. Do we really need to track appliedTerms/appliedQueries? Ie is this just an 
optimization so that if the caller deletes by the Term/Query again we know to 
skip it?

Yes to the 2nd question. Why would we want to try deleting multiple times?
The cost is the terms dictionary lookup which you're saying is in the
noise? I think potentially cracking open a query again could be costly in
cases where the query is indeed expensive.
{quote}

I'm saying this is unlikely to be worthwhile way to spend RAM.

EG most apps wouldn't delete by same term again, like they'd
typically go and process a big batch of docs, deleting by an id
field and adding the new version of the doc, where a given id is seen
only once in this session, and then IW is committed/closed?


 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2762) Don't leak deleted open file handles with pooled readers

2010-11-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933070#action_12933070
 ] 

Jason Rutherglen commented on LUCENE-2762:
--

{quote}I think we should improve our default MP.  First, maybe we should set a 
maxMergeMB by default?{quote}

That's a good idea, however would we set an absolute size or a size relative to 
the aggregate size of the index?  I'm using 5 GB in production as otherwise I'm 
not sure the merge cost is worth the potential performance improvement, ie, 
long merges adversely affects indexing performance.

 Don't leak deleted open file handles with pooled readers
 

 Key: LUCENE-2762
 URL: https://issues.apache.org/jira/browse/LUCENE-2762
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2762.patch


 If you have CFS enabled today, and pooling is enabled (either directly
 or because you've pulled an NRT reader), IndexWriter will hold open
 SegmentReaders against the non-CFS format of each merged segment.
 So even if you close all NRT readers you've pulled from the writer,
 you'll still see file handles open against files that have been
 deleted.
 This count will not grow unbounded, since it's limited by the number
 of segments in the index, but it's still a serious problem since the
 app had turned off CFS in the first place presumably to avoid risk of
 too-many-open-files.  It's also bad because it ties up disk space
 since these files would otherwise be deleted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 1531 - Still Failing

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1531/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:923)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:861)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:446)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8758 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1036080 [1/4] - in /lucene/dev/branches/docvalues: ./ lucene/ lucene/contrib/ lucene/contrib/highlighter/src/test/ lucene/contrib/instantiated/src/test/org/apache/lucene/store/instant

2010-11-17 Thread Grant Ingersoll

On Nov 17, 2010, at 10:55 AM, Uwe Schindler wrote:

 JUHU,
 
 No prop changes (only in the into, the affected files are listed, but no 
 longer any endless pages of rev numbers!
 
 Thanks Grant!

Hey, thank you!  You guys did the work to figure it out, I just flipped the 
switch.  It's too bad we couldn't be a little more fine grained about 
propchanges, but I'll live with it for now.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: mergeinfo commit mails, possible solution

2010-11-17 Thread Robert Muir
big +1, we can actually review backports now... this was really bad before.

On Wed, Nov 17, 2010 at 11:57 AM, Steven A Rowe sar...@syr.edu wrote:
 Uwe, my Inbox thanks you. - Steve

 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Wednesday, November 17, 2010 9:04 AM
 To: dev@lucene.apache.org
 Subject: RE: mergeinfo commit mails, possible solution

 The ASF does not use the vanilla mailer.py script, they are using
 http://opensource.perlig.de/svnmailer - and this one does fantastic work
 regarding this! We just need to change the config files of this tool and
 specify a special subtree config for /lucene project folder.

 See also the attached mail!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

  -Original Message-
  From: Steven A Rowe [mailto:sar...@syr.edu]
  Sent: Wednesday, November 17, 2010 7:02 AM
  To: dev@lucene.apache.org
  Subject: RE: mergeinfo commit mails, possible solution
 
  After looking more closely at the vanilla Subversion version of the
 mailer.py
  script, I'm 99% sure that removing propchange from the generate_diffs
 list will
  have zero effect, but I'd love to be proven wrong.
 
  Turns out Subversion's mailer.py once had a much larger set of property
  filtering options, but C. Mike (Pilato) thought the option set was too
 baroque, so
  he reverted the entire set - see
  http://subversion.tigris.org/issues/show_bug.cgi?id=2944.
 
  I've asked on #svn about re-instating the ignore_props and
 ignore_propdiffs
  regex-valued options - these would allow us to only ignore svn:mergeinfo
 diffs
  while still noting that files' properties have changed, without
 affecting other
  properties or their diffs.  No responses yet, hopefully tomorrow.
 
  Steve
 
   -Original Message-
   From: Grant Ingersoll [mailto:gsing...@apache.org]
   Sent: Tuesday, November 16, 2010 9:55 AM
   To: dev@lucene.apache.org
   Subject: mergeinfo commit mails, possible solution
  
   From #lucene IRC:
   gsingers:sarowe and I were talking about the mergeinfo commit overload
   [09:43]gsingers:and the asf_mailer.conf file [09:43]gsingers:In
   looking at the file [09:44]gsingers:it appears the one thing we have
   the ability to do is to turn off the generation of diffs for
   [09:44]gsingers:events [09:44]gsingers:The default setting is:
   [09:44]gsingers:generate_diffs = add copy modify propchange
   [09:44]gsingers:sarowe and I are proposing to change our settings to
   just be add/copy/modify [09:44]gsingers:and try dropping propchange
   [09:45]gsingers:I honestly don't know whether it will work or not
   [09:45]gsingers:and it will also likely mean we will miss
   notifications of other propchanges [09:45]gsingers:We've asked on
   #asfinfra if there are other options [09:45]gsingers:and sarowe is
   looking into the mailer.py script to see if there are other things
   available [09:46]gsingers:I guess the question here is, do people want
   to try turning off propchange?
  
  
   -Grant
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ASF Public Mail Archives on Amazon S3

2010-11-17 Thread Michael McCandless
Grant, public_p_r.tar seems to be missing?  Is that intentional?
Maybe some super-secret project inside there :)

Mike

On Thu, Oct 14, 2010 at 12:05 PM, Grant Ingersoll gsing...@apache.org wrote:
 Hi ORPers,

 I put up the complete ASF public mail archives as of about 3 weeks ago on 
 Amazon's S3 and have made them public (let me know if I messed up, it is the 
 first time I've done this).  I also intend, in the coming weeks, to convert 
 them into Mahout files (if anyone wants to help let me know).

 There are 5 files:
 https://s3.amazonaws.com/asf-mail-archives/public_a_d.tar
 https://s3.amazonaws.com/asf-mail-archives/public_e_k.tar
 https://s3.amazonaws.com/asf-mail-archives/public_l_o.tar
 https://s3.amazonaws.com/asf-mail-archives/public_s_t.tar
 https://s3.amazonaws.com/asf-mail-archives/public_u_z.tar

 The tarballs are organized by Top Level Project name (i.e. Mahout is in the 
 public_l_o.tar file).  The tarballs contain GZIP files by date, I believe.  I 
 believe the total uncompressed file size is somewhere in the 80-100GB range.  
 That should be sufficient to drive some semi-interesting things in terms of 
 scale, even if it is towards the smaller end of things.

 As the ASF has very clear public mailing list archive policies, it is my 
 belief that this data set is completely unencumbered.

 From an ORP standpoint, this might make for a first data set for evaluation 
 once we have the evaluator framework in place.

 Cheers,
 Grant

 --
 Grant Ingersoll
 http://www.lucidimagination.com




Re: Basic authentication for stream.url

2010-11-17 Thread Gregor Kaczor

 Hello Jayendra,

i did not quiet understand what you are aiming for. Usually you would 
pass basic authentification credentials along with the url. In Solr+Java 
you might use the following piece of code:


int port = 80;
String url = http://localhost:; + port
+ /solr/select;
String user = someuser;
String password = somepassword;
CommonsHttpSolrServer commonsHttpSolrServer = null;
HttpClient httpclient = new HttpClient(
new MultiThreadedHttpConnectionManager());
try {
commonsHttpSolrServer = new CommonsHttpSolrServer(url, 
httpclient);

commonsHttpSolrServer.setParser(new XMLResponseParser());
} catch (MalformedURLException e) {
e.printStackTrace();
return;
}

if (user != null  password != null) {
commonsHttpSolrServer.getHttpClient().getParams()
.setAuthenticationPreemptive(true);
Credentials defaultcreds = new 
UsernamePasswordCredentials(user,

password);

commonsHttpSolrServer.getHttpClient().getState().setCredentials(

new AuthScope(localhost, port, AuthScope.ANY_REALM),
defaultcreds);
}

Hope I could help a little.

Kind Regards,
Gregor

On 11/17/2010 02:57 AM, Jayendra Patil wrote:
We intend to use schema.url for indexing documents. However, the 
remote urls are secured and would need basic authentication to be able 
access the document.


The implementation with stream.file would mean to download the files 
and would cause duplicity, whereas stream.body would have indexing 
performance issues with the hugh data being transferred over the network.


The current implementation for stream.url in 
ContentStreamBase.URLStream does not support authentication.

But can be easily supported by :-
1. Passing additional authentication parameter e.g. stream.url.auth 
with the encoded authentication value - SolrRequestParsers
2. Setting Authorization request property for the Connection - 
ContentStreamBase.URLStream
this.conn.setRequestProperty(Authorization, Basic  + 
encodedauthentication);


Any suggestions ???

Regards,
Jayendra




--
How to find files on the Internet? FindFiles.net http://findfiles.net!


[jira] Updated: (SOLR-236) Field collapsing

2010-11-17 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-236:
---

Attachment: SOLR-236-distinctFacet.patch

TO do distinct facet counts.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, 
 SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
 SOLR-236-distinctFacet.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: need some help =)

2010-11-17 Thread Granroth, Neal V.
Why not digestible?  This type of question with clear short source code is most 
likely to be answered.

- Neal

-Original Message-
From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com] 
Sent: Wednesday, November 17, 2010 1:33 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: need some help =)

Why are you adding the bytes as the field value?  You should add the 
fields as strings and you should be fine.

Also, note that most people won't respond to this kind of code because 
it is not easily digestable.

-Original Message-
From: asmcad [mailto:asm...@gmail.com] 
Sent: Wednesday, November 17, 2010 3:02 PM
To: lucene-net-dev
Subject: need some help =)


it's a simple index and search application but i couldn't make it work. 
it doesn't give any error but it  doesn't give any results too.

   1.
  using System;
   2.
  using System.Collections.Generic;
   3.
  using System.ComponentModel;
   4.
  using System.Data;
   5.
  using System.Drawing;
   6.
  using System.Linq;
   7.
  using System.Text;
   8.
  using System.Windows.Forms;
   9.
  using Lucene.Net;
  10.
  using Lucene.Net.Analysis.Standard;
  11.
  using Lucene.Net.Documents;
  12.
  using Lucene.Net.Index;
  13.
  using Lucene.Net.QueryParsers;
  14.
  using Lucene.Net.Search;
  15.
  using System.IO;
  16.
  17.
  namespace newLucene
  18.
  {
  19.
  public partial class Form1 : Form
  20.
  {
  21.
  public Form1()
  22.
  {
  23.
   InitializeComponent();
  24.
  }
  25.
  26.
  private void buttonIndex_Click(object sender, EventArgs e)
  27.
  {
  28.
   IndexWriter indexwrtr = new
  IndexWriter(@c:\index\,new StandardAnalyzer() , true);
  29.
   Document doc = new Document();
  30.
  string filename = @fer.txt;
  31.
   Lucene.Net.QueryParsers.QueryParser df;
  32.
  33.
  34.
  35.
  System.IO.StreamReader local_StreamReader = new
  System.IO.StreamReader(@C:\z\fer.txt);
  36.
  string  file_text = local_StreamReader.ReadToEnd();
  37.
  38.
  System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
  39.
   doc.Add(new
  Field(text,encoding.GetBytes(file_text),Field.Store.YES));
  40.
   doc.Add(new
  Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
  41.
   doc.Add(new Field(title,
  encoding.GetBytes(filename), Field.Store.YES));
  42.
   indexwrtr.AddDocument(doc);
  43.
  44.
   indexwrtr.Optimize();
  45.
   indexwrtr.Close();
  46.
  47.
  }
  48.
  49.
  private void buttonSearch_Click(object sender, EventArgs e)
  50.
  {
  51.
   IndexSearcher indxsearcher = new
  IndexSearcher(@C:\index\);
  52.
  53.
   QueryParser parser = new QueryParser(contents, new
  StandardAnalyzer());
  54.
   Query query = parser.Parse(textBoxQuery.Text);
  55.
  56.
  //Lucene.Net.QueryParsers.QueryParser qp = new
  QueryParser(Lucene.Net.QueryParsers.CharStream
  s).Parse(textBoxQuery.Text);
  57.
   Hits hits = indxsearcher.Search(query);
  58.
  59.
  60.
  for (int i = 0; i  hits.Length(); i++)
  61.
  {
  62.
  63.
   Document doc = hits.Doc(i);
  64.
  65.
  66.
  string filename = doc.Get(title);
  67.
  string path = doc.Get(path);
  68.
  string folder = Path.GetDirectoryName(path);
  69.
  70.
  71.
   ListViewItem item = new ListViewItem(new string[]
  { null, filename, asd, hits.Score(i).ToString() });
  72.
   item.Tag = path;
  73.
  74.
  this.listViewResults.Items.Add(item);
  75.
   Application.DoEvents();
  76.
  }
  77.
  78.
   indxsearcher.Close();
  79.
  80.
  81.
  82.
  83.
  }
  84.
  }
  85.
  }


thanks





Re: need some help =)

2010-11-17 Thread asmcad

 =)  i was about to write an answer...

On 17.11.2010 20:51, Granroth, Neal V. wrote:

Why not digestible?  This type of question with clear short source code is most 
likely to be answered.

- Neal

-Original Message-
From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com]
Sent: Wednesday, November 17, 2010 1:33 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: need some help =)

Why are you adding the bytes as the field value?  You should add the 
fields as strings and you should be fine.

Also, note that most people won't respond to this kind of code because 
it is not easily digestable.

-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 3:02 PM
To: lucene-net-dev
Subject: need some help =)


it's a simple index and search application but i couldn't make it work.
it doesn't give any error but it  doesn't give any results too.

1.
   using System;
2.
   using System.Collections.Generic;
3.
   using System.ComponentModel;
4.
   using System.Data;
5.
   using System.Drawing;
6.
   using System.Linq;
7.
   using System.Text;
8.
   using System.Windows.Forms;
9.
   using Lucene.Net;
   10.
   using Lucene.Net.Analysis.Standard;
   11.
   using Lucene.Net.Documents;
   12.
   using Lucene.Net.Index;
   13.
   using Lucene.Net.QueryParsers;
   14.
   using Lucene.Net.Search;
   15.
   using System.IO;
   16.
   17.
   namespace newLucene
   18.
   {
   19.
   public partial class Form1 : Form
   20.
   {
   21.
   public Form1()
   22.
   {
   23.
InitializeComponent();
   24.
   }
   25.
   26.
   private void buttonIndex_Click(object sender, EventArgs e)
   27.
   {
   28.
IndexWriter indexwrtr = new
   IndexWriter(@c:\index\,new StandardAnalyzer() , true);
   29.
Document doc = new Document();
   30.
   string filename = @fer.txt;
   31.
Lucene.Net.QueryParsers.QueryParser df;
   32.
   33.
   34.
   35.
   System.IO.StreamReader local_StreamReader = new
   System.IO.StreamReader(@C:\z\fer.txt);
   36.
   string  file_text = local_StreamReader.ReadToEnd();
   37.
   38.
   System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
   39.
doc.Add(new
   Field(text,encoding.GetBytes(file_text),Field.Store.YES));
   40.
doc.Add(new
   Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
   41.
doc.Add(new Field(title,
   encoding.GetBytes(filename), Field.Store.YES));
   42.
indexwrtr.AddDocument(doc);
   43.
   44.
indexwrtr.Optimize();
   45.
indexwrtr.Close();
   46.
   47.
   }
   48.
   49.
   private void buttonSearch_Click(object sender, EventArgs e)
   50.
   {
   51.
IndexSearcher indxsearcher = new
   IndexSearcher(@C:\index\);
   52.
   53.
QueryParser parser = new QueryParser(contents, new
   StandardAnalyzer());
   54.
Query query = parser.Parse(textBoxQuery.Text);
   55.
   56.
   //Lucene.Net.QueryParsers.QueryParser qp = new
   QueryParser(Lucene.Net.QueryParsers.CharStream
   s).Parse(textBoxQuery.Text);
   57.
Hits hits = indxsearcher.Search(query);
   58.
   59.
   60.
   for (int i = 0; i  hits.Length(); i++)
   61.
   {
   62.
   63.
Document doc = hits.Doc(i);
   64.
   65.
   66.
   string filename = doc.Get(title);
   67.
   string path = doc.Get(path);
   68.
   string folder = Path.GetDirectoryName(path);
   69.
   70.
   71.
ListViewItem item = new ListViewItem(new string[]
   { null, filename, asd, hits.Score(i).ToString() });
   72.
item.Tag = path;
   73.
   74.
   this.listViewResults.Items.Add(item);
   75.
Application.DoEvents();
   76.
   }
   77.
   78.
indxsearcher.Close();
   79.
   80.
   81.
   82.
   83.
   }
   84.
   }
   85.
   }


thanks







Stemming using automata

2010-11-17 Thread karl.wright
Folks,
I had an interesting conversation with Simon a few weeks back.  It occurred to 
me that it might be possible to build an automata that handles  stemming and 
pluralization on searches.  Just a thought...
Karl



[jira] Updated: (SOLR-2240) Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2240:
-

Attachment: SOLR-2240.patch

Attached the Patch for the changes.

 Basic authentication for stream.url
 ---

 Key: SOLR-2240
 URL: https://issues.apache.org/jira/browse/SOLR-2240
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Jayendra Patil
Priority: Minor
 Attachments: SOLR-2240.patch


 We intend to use stream.url for indexing documents from remote locations 
 exposed through http.
 However, the remote urls are secured and would need basic authentication to 
 be able access the documents.
 The current implementation for stream.url in ContentStreamBase.URLStream does 
 not support authentication.
 The implementation with stream.file would mean to download the files to a 
 local box and would cause duplicity, whereas stream.body would have indexing 
 performance issues with the hugh data being transferred over the network.
 An approach would be :-
 1. Passing additional authentication parameter e.g. stream.url.auth with the 
 encoded authentication value - SolrRequestParsers
 2. Setting Authorization request property for the Connection - 
 ContentStreamBase.URLStream
 this.conn.setRequestProperty(Authorization, Basic  + 
 encodedauthentication);
 Any thoughts ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil
JIRA - https://issues.apache.org/jira/browse/SOLR-2240
Patch attached.

How does the patch make it to the trunk ??? Had submitted a couple of more
patches SOLR-2156  SOLR-2029, would like them to be included in the
release.

Regards,
Jayendra

On Wed, Nov 17, 2010 at 2:15 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Nov 16, 2010 at 8:57 PM, Jayendra Patil
 jayendra.patil@gmail.com wrote:
  We intend to use schema.url for indexing documents. However, the remote
 urls
  are secured and would need basic authentication to be able access the
  document.
 
  The implementation with stream.file would mean to download the files and
  would cause duplicity, whereas stream.body would have indexing
 performance
  issues with the hugh data being transferred over the network.
 
  The current implementation for stream.url in ContentStreamBase.URLStream
  does not support authentication.
  But can be easily supported by :-
  1. Passing additional authentication parameter e.g. stream.url.auth with
 the
  encoded authentication value - SolrRequestParsers
  2. Setting Authorization request property for the Connection -
  ContentStreamBase.URLStream
  this.conn.setRequestProperty(Authorization, Basic  +
  encodedauthentication);


 Sounds like a good idea to me.
 Could you open a JIRA issue for this feature, and supply a patch if
 you get to it?

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Stemming using automata

2010-11-17 Thread Robert Muir
Karl, you are right.

this is one of the ways i originally used this thing.

i've done some relevance experiments along these lines (some summary
results here http://www.slideshare.net/otisg/finite-state-queries-in-lucene).

in this case i compared 3 cases: index-time porter stemming,
index-time plural stemming, and query-time plural stemming (with
automaton).

in general you can get similar results, slower query speed, but more
flexibility. for instance, you could have a queryparser that
implements a stem() operator without indexing everything twice.

probably pretty boring for most people, but in some cases (e.g. lots
of languages) query-time starts to become more attractive...

On Wed, Nov 17, 2010 at 3:18 PM,  karl.wri...@nokia.com wrote:
 Folks,

 I had an interesting conversation with Simon a few weeks back.  It occurred
 to me that it might be possible to build an automata that handles  stemming
 and pluralization on searches.  Just a thought…

 Karl



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: need some help =)

2010-11-17 Thread Digy
You can try UnaccentedWordAnalyzer in /contrib/Contrib.Net/
(you have to download the contrib code from svn).

DIGY

-Original Message-
From: asmcad [mailto:asm...@gmail.com] 
Sent: Wednesday, November 17, 2010 11:24 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)

i need turkish analyzer. my lucene book says i need to use 
SnowballAnalyzer but i can't access to it as 
Lucene.Net.Analysis.Snowball should i install another library to use it?

On 17.11.2010 21:12, Granroth, Neal V. wrote:
 You need to pick a suitable analyzer for use during indexing and for
queries.  The StandardAnalyzer you are using will most likely break the
words apart at the non-english characters.

 You might want to consider using the Luke tool to inspect the index you've
created and see who the words in your documents were split and indexed.


 - Neal

 -Original Message-
 From: asmcad [mailto:asm...@gmail.com]
 Sent: Wednesday, November 17, 2010 3:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: need some help =)


 i solved the problem . now i have non-english character problem.
 when i search like something çşğuı(i'm not sure you can see this)
 characters. i don't get any results.
 how can i solve this ?

 by the way sorry about the content messing =)

 thanks  for the  previous help  =)

 On 17.11.2010 20:16, Digy wrote:
  1.
 using System;
  2.
 using System.Collections.Generic;
  3.
 using System.ComponentModel;
  4.
 using System.Data;
  5.
 using System.Drawing;
  6.
 using System.Linq;
  7.
 using System.Text;
  8.
 using System.Windows.Forms;
  9.
 using Lucene.Net;
 10.
 using Lucene.Net.Analysis.Standard;
 11.
 using Lucene.Net.Documents;
 12.
 using Lucene.Net.Index;
 13.
 using Lucene.Net.QueryParsers;
 14.
 using Lucene.Net.Search;
 15.
 using System.IO;
 16.
 17.
 namespace newLucene
 18.
 {
 19.
 public partial class Form1 : Form
 20.
 {
 21.
 public Form1()
 22.
 {
 23.
  InitializeComponent();
 24.
 }
 25.
 26.
 private void buttonIndex_Click(object sender, EventArgs e)
 27.
 {
 28.
  IndexWriter indexwrtr = new
 IndexWriter(@c:\index\,new StandardAnalyzer() , true);
 29.
  Document doc = new Document();
 30.
 string filename = @fer.txt;
 31.
  Lucene.Net.QueryParsers.QueryParser df;
 32.
 33.
 34.
 35.
 System.IO.StreamReader local_StreamReader = new
 System.IO.StreamReader(@C:\z\fer.txt);
 36.
 string  file_text = local_StreamReader.ReadToEnd();
 37.
 38.
 System.Text.UTF8Encoding encoding = new
System.Text.UTF8Encoding();
 39.
  doc.Add(new
 Field(text,encoding.GetBytes(file_text),Field.Store.YES));
 40.
  doc.Add(new
 Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
 41.
  doc.Add(new Field(title,
 encoding.GetBytes(filename), Field.Store.YES));
 42.
  indexwrtr.AddDocument(doc);
 43.
 44.
  indexwrtr.Optimize();
 45.
  indexwrtr.Close();
 46.
 47.
 }
 48.
 49.
 private void buttonSearch_Click(object sender, EventArgs e)
 50.
 {
 51.
  IndexSearcher indxsearcher = new
 IndexSearcher(@C:\index\);
 52.
 53.
  QueryParser parser = new QueryParser(contents, new
 StandardAnalyzer());
 54.
  Query query = parser.Parse(textBoxQuery.Text);
 55.
 56.
 //Lucene.Net.QueryParsers.QueryParser qp = new
 QueryParser(Lucene.Net.QueryParsers.CharStream
 s).Parse(textBoxQuery.Text);
 57.
  Hits hits = indxsearcher.Search(query);
 58.
 59.
 60.
 for (int i = 0; i   hits.Length(); i++)
 61.
 {
 62.
 63.
  Document doc = hits.Doc(i);
 64.
 65.
 66.
 string filename = doc.Get(title);
 67.
 string path = doc.Get(path);
 68.
 string folder = Path.GetDirectoryName(path);
 69.
 70.
 71.
  ListViewItem item = new ListViewItem(new
string[]
 { null, filename, asd, hits.Score(i).ToString() });
 72.
  item.Tag = path;
 73.
 74.
 this.listViewResults.Items.Add(item);
 75.
  Application.DoEvents();
 76.
 }
 77.
 78.
  indxsearcher.Close();
 79.
 80.
 81.
 82.
 83.
 }
 84.
   

[jira] Commented: (SOLR-236) Field collapsing

2010-11-17 Thread Stephen Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933153#action_12933153
 ] 

Stephen Weiss commented on SOLR-236:


Cheers peterwang, you're probably right.  I didn't actually use this patch, I 
made the modifications by hand after applying Martijn's patch.  I generally 
don't make my own patch files, I just let SVN do it for me, so I'm not really 
aware of the syntax...  The point is to just delete those extra lines.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, 
 SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
 SOLR-236-distinctFacet.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Basic authentication for stream.url

2010-11-17 Thread Erick Erickson
How does the patch make it to the trunk

You need to track it and prompt the dev list if you think it's forgotten.
Basically, when a committer thinks it's ready and valuable s/he will
commit it to trunk for you.

But give the committers some time before prompting, they're usually
up to their ears in other changes

Best
Erick

On Wed, Nov 17, 2010 at 3:30 PM, Jayendra Patil 
jayendra.patil@gmail.com wrote:

 JIRA - https://issues.apache.org/jira/browse/SOLR-2240
 Patch attached.

 How does the patch make it to the trunk ??? Had submitted a couple of more
 patches SOLR-2156  SOLR-2029, would like them to be included in the
 release.

 Regards,
 Jayendra


 On Wed, Nov 17, 2010 at 2:15 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 On Tue, Nov 16, 2010 at 8:57 PM, Jayendra Patil
 jayendra.patil@gmail.com wrote:
  We intend to use schema.url for indexing documents. However, the remote
 urls
  are secured and would need basic authentication to be able access the
  document.
 
  The implementation with stream.file would mean to download the files and
  would cause duplicity, whereas stream.body would have indexing
 performance
  issues with the hugh data being transferred over the network.
 
  The current implementation for stream.url in ContentStreamBase.URLStream
  does not support authentication.
  But can be easily supported by :-
  1. Passing additional authentication parameter e.g. stream.url.auth with
 the
  encoded authentication value - SolrRequestParsers
  2. Setting Authorization request property for the Connection -
  ContentStreamBase.URLStream
  this.conn.setRequestProperty(Authorization, Basic  +
  encodedauthentication);


 Sounds like a good idea to me.
 Could you open a JIRA issue for this feature, and supply a patch if
 you get to it?

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





RE: need some help =)

2010-11-17 Thread Digy
UnaccentedWordAnalyzer doesn't make use of stemming.

If you really need it;
a) SnowballAnalyzer is not good in turkish stemming.
b) It is better to write a custom analyzer using Zemberek or its .NET
version NZemberek.

DIGY



-Original Message-
From: asmcad [mailto:asm...@gmail.com] 
Sent: Wednesday, November 17, 2010 11:24 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)

i need turkish analyzer. my lucene book says i need to use 
SnowballAnalyzer but i can't access to it as 
Lucene.Net.Analysis.Snowball should i install another library to use it?

On 17.11.2010 21:12, Granroth, Neal V. wrote:
 You need to pick a suitable analyzer for use during indexing and for
queries.  The StandardAnalyzer you are using will most likely break the
words apart at the non-english characters.

 You might want to consider using the Luke tool to inspect the index you've
created and see who the words in your documents were split and indexed.


 - Neal

 -Original Message-
 From: asmcad [mailto:asm...@gmail.com]
 Sent: Wednesday, November 17, 2010 3:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: need some help =)


 i solved the problem . now i have non-english character problem.
 when i search like something çşğuı(i'm not sure you can see this)
 characters. i don't get any results.
 how can i solve this ?

 by the way sorry about the content messing =)

 thanks  for the  previous help  =)

 On 17.11.2010 20:16, Digy wrote:
  1.
 using System;
  2.
 using System.Collections.Generic;
  3.
 using System.ComponentModel;
  4.
 using System.Data;
  5.
 using System.Drawing;
  6.
 using System.Linq;
  7.
 using System.Text;
  8.
 using System.Windows.Forms;
  9.
 using Lucene.Net;
 10.
 using Lucene.Net.Analysis.Standard;
 11.
 using Lucene.Net.Documents;
 12.
 using Lucene.Net.Index;
 13.
 using Lucene.Net.QueryParsers;
 14.
 using Lucene.Net.Search;
 15.
 using System.IO;
 16.
 17.
 namespace newLucene
 18.
 {
 19.
 public partial class Form1 : Form
 20.
 {
 21.
 public Form1()
 22.
 {
 23.
  InitializeComponent();
 24.
 }
 25.
 26.
 private void buttonIndex_Click(object sender, EventArgs e)
 27.
 {
 28.
  IndexWriter indexwrtr = new
 IndexWriter(@c:\index\,new StandardAnalyzer() , true);
 29.
  Document doc = new Document();
 30.
 string filename = @fer.txt;
 31.
  Lucene.Net.QueryParsers.QueryParser df;
 32.
 33.
 34.
 35.
 System.IO.StreamReader local_StreamReader = new
 System.IO.StreamReader(@C:\z\fer.txt);
 36.
 string  file_text = local_StreamReader.ReadToEnd();
 37.
 38.
 System.Text.UTF8Encoding encoding = new
System.Text.UTF8Encoding();
 39.
  doc.Add(new
 Field(text,encoding.GetBytes(file_text),Field.Store.YES));
 40.
  doc.Add(new
 Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
 41.
  doc.Add(new Field(title,
 encoding.GetBytes(filename), Field.Store.YES));
 42.
  indexwrtr.AddDocument(doc);
 43.
 44.
  indexwrtr.Optimize();
 45.
  indexwrtr.Close();
 46.
 47.
 }
 48.
 49.
 private void buttonSearch_Click(object sender, EventArgs e)
 50.
 {
 51.
  IndexSearcher indxsearcher = new
 IndexSearcher(@C:\index\);
 52.
 53.
  QueryParser parser = new QueryParser(contents, new
 StandardAnalyzer());
 54.
  Query query = parser.Parse(textBoxQuery.Text);
 55.
 56.
 //Lucene.Net.QueryParsers.QueryParser qp = new
 QueryParser(Lucene.Net.QueryParsers.CharStream
 s).Parse(textBoxQuery.Text);
 57.
  Hits hits = indxsearcher.Search(query);
 58.
 59.
 60.
 for (int i = 0; i   hits.Length(); i++)
 61.
 {
 62.
 63.
  Document doc = hits.Doc(i);
 64.
 65.
 66.
 string filename = doc.Get(title);
 67.
 string path = doc.Get(path);
 68.
 string folder = Path.GetDirectoryName(path);
 69.
 70.
 71.
  ListViewItem item = new ListViewItem(new
string[]
 { null, filename, asd, hits.Score(i).ToString() });
 72.
  item.Tag = path;
 73.
 74.
 this.listViewResults.Items.Add(item);
 75.
  Application.DoEvents();
 76.
 }
 77.
 78.
 

RE: Lucene project announcement

2010-11-17 Thread Granroth, Neal V.
Is Java Lucene grown up ?  Look at how much discussion it took to determine 
how to get Java out of the name :)

The discussion about advancing the algorithm in C#/.NET seems to be missing the 
point.  If you're developing at the concept level the specific language you use 
becomes unimportant.  However as most of the concept developers apparently find 
Java convenient; others wanting to participate at the concept level would find 
it more beneficial to join that brain-pool instead of diluting the effort by 
starting up elsewhere.
   

- Neal

-Original Message-
From: George Aroush [mailto:geo...@aroush.net] 
Sent: Tuesday, November 16, 2010 10:55 PM
To: lucene-net-...@lucene.apache.org
Cc: dev@lucene.apache.org
Subject: RE: Lucene project announcement

This topic has been coming back again and again which I have tried to
address multiple times, so let me try again.

1) Java Lucene started years before the first C# version (4+ years if I get
my history right), thus it defined and has been the definer of the
technology and the API.  It is the established leader, and everyone else is
just a follower.

2) Lucene.Net is no were mature as Java Lucene, never got established
itself, or had a rich development community -- thus why we are here today.

3) If and only if, the community of Lucene.Net (or Lucene over at
codeplex.com) manages to proves itself to the level of Java Lucene, only
then such a community will have the voice to influence folks over at Java
Lucene.  Only then you will see the two community discussing search engine
vs. port issues or the state of Lucene.Net.

If you look in my previous posts, I have pointed those out.  We must first:

1) Be in par with Java Lucene release and keep up with commit-per-commit
port.

2) Prove Lucene.Net is a grownup project with followers and a healthy
community (just like Java Lucene).

If we don't achieve the above, folks over at Java Lucene will not take us
seriously, and thus we can't influence them.

-- George

-Original Message-
From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com] 
Sent: Friday, November 12, 2010 10:36 AM
To: lucene-net-...@lucene.apache.org
Cc: dev@lucene.apache.org
Subject: RE: Lucene project announcement

Paul, et al,

Paul, God bless you.  This is probably the most rational, practical
perspective I've seen on the whole matter since the debacle started.

While Lucene started off as a Java project, it's massive success
indicates that the concepts around it are very desirable by developers in
other technologies, and that the Java product isn't being translated well
into those technology stacks.

That's not a slight against those who have contributed to this point
to try and keep the .NET version in line with the Java one (despite me
thinking that the actual approach to doing so is a horribly misguided
approach).

That said, there should be a serious conversation with the
Java-version folk about making this happen.  How can Lucene be
abstracted/standardized in a non-technology-stack-specific way that other
technology stacks can create implementations against that
abstraction/standard.

Is it too much to ask of the Java folk?  Perhaps.  After all, they
haven't done it yet and it doesn't seem like they see the need for this.
This isn't an unjustified position; that project has a massive user base and
success which creates massive responsibilities to the project that must be
fulfilled.

If such a thing proceeds, this is what I'd like to see in such an
abstraction:

- Technology-agnostic concepts used, down to the class level:
- Classes might be the one exception, they are near universal.
However, this could be something like entity
- Properties - Java doesn't have properties, they have a property
convention.  .NET has the concept of a property, which translates to a named
getter and/or setter which can execute additional code on either in addition
to the assignment.
- Fields - Raw exposed data points.  Whether or not these ^should^
be used is a different story, but there are some places where they are used
so a definition is needed.
- Methods - Functions/methods, whatever you want to call them, we
all know what they are.
- In the end, the names are not important as much as the
abstractions are, I think we all have an idea on what they are.
- Right now, I don't have a problem with a class-by-class mapping, but over
time, whether or not class design was done to suit the technology should be
addressed, and ultimately abstracted out if this is the case.
- Things like ^what^ is returned from methods or internal constructs that
are used to make guarantees about behavior and the like should be abstracted
out.  For example, in Lucene.NET we have the following (in order to maintain
a line-by-line port in most cases):
- A custom implementation of ReaderWriterLock.  There's no reason
for something like this. 
- 

Re: need some help =)

2010-11-17 Thread asmcad
i don't have any ide writing custom analyzer... so i'll stick with 
SnowballAnalyzer for now.


On 17.11.2010 21:53, Digy wrote:

UnaccentedWordAnalyzer doesn't make use of stemming.

If you really need it;
a) SnowballAnalyzer is not good in turkish stemming.
b) It is better to write a custom analyzer using Zemberek or its .NET
version NZemberek.

DIGY



-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 11:24 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)

i need turkish analyzer. my lucene book says i need to use
SnowballAnalyzer but i can't access to it as
Lucene.Net.Analysis.Snowball should i install another library to use it?

On 17.11.2010 21:12, Granroth, Neal V. wrote:

You need to pick a suitable analyzer for use during indexing and for

queries.  The StandardAnalyzer you are using will most likely break the
words apart at the non-english characters.

You might want to consider using the Luke tool to inspect the index you've

created and see who the words in your documents were split and indexed.


- Neal

-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 3:06 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)


i solved the problem . now i have non-english character problem.
when i search like something çşğuı(i'm not sure you can see this)
characters. i don't get any results.
how can i solve this ?

by the way sorry about the content messing =)

thanks  for the  previous help  =)

On 17.11.2010 20:16, Digy wrote:

  1.
 using System;
  2.
 using System.Collections.Generic;
  3.
 using System.ComponentModel;
  4.
 using System.Data;
  5.
 using System.Drawing;
  6.
 using System.Linq;
  7.
 using System.Text;
  8.
 using System.Windows.Forms;
  9.
 using Lucene.Net;
 10.
 using Lucene.Net.Analysis.Standard;
 11.
 using Lucene.Net.Documents;
 12.
 using Lucene.Net.Index;
 13.
 using Lucene.Net.QueryParsers;
 14.
 using Lucene.Net.Search;
 15.
 using System.IO;
 16.
 17.
 namespace newLucene
 18.
 {
 19.
 public partial class Form1 : Form
 20.
 {
 21.
 public Form1()
 22.
 {
 23.
  InitializeComponent();
 24.
 }
 25.
 26.
 private void buttonIndex_Click(object sender, EventArgs e)
 27.
 {
 28.
  IndexWriter indexwrtr = new
 IndexWriter(@c:\index\,new StandardAnalyzer() , true);
 29.
  Document doc = new Document();
 30.
 string filename = @fer.txt;
 31.
  Lucene.Net.QueryParsers.QueryParser df;
 32.
 33.
 34.
 35.
 System.IO.StreamReader local_StreamReader = new
 System.IO.StreamReader(@C:\z\fer.txt);
 36.
 string  file_text = local_StreamReader.ReadToEnd();
 37.
 38.
 System.Text.UTF8Encoding encoding = new

System.Text.UTF8Encoding();

 39.
  doc.Add(new
 Field(text,encoding.GetBytes(file_text),Field.Store.YES));
 40.
  doc.Add(new
 Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
 41.
  doc.Add(new Field(title,
 encoding.GetBytes(filename), Field.Store.YES));
 42.
  indexwrtr.AddDocument(doc);
 43.
 44.
  indexwrtr.Optimize();
 45.
  indexwrtr.Close();
 46.
 47.
 }
 48.
 49.
 private void buttonSearch_Click(object sender, EventArgs e)
 50.
 {
 51.
  IndexSearcher indxsearcher = new
 IndexSearcher(@C:\index\);
 52.
 53.
  QueryParser parser = new QueryParser(contents, new
 StandardAnalyzer());
 54.
  Query query = parser.Parse(textBoxQuery.Text);
 55.
 56.
 //Lucene.Net.QueryParsers.QueryParser qp = new
 QueryParser(Lucene.Net.QueryParsers.CharStream
 s).Parse(textBoxQuery.Text);
 57.
  Hits hits = indxsearcher.Search(query);
 58.
 59.
 60.
 for (int i = 0; ihits.Length(); i++)
 61.
 {
 62.
 63.
  Document doc = hits.Doc(i);
 64.
 65.
 66.
 string filename = doc.Get(title);
 67.
 string path = doc.Get(path);
 68.
 string folder = Path.GetDirectoryName(path);
 69.
 70.
 71.
  ListViewItem item = new ListViewItem(new

string[]

 { null, filename, asd, hits.Score(i).ToString() });
 72.
  item.Tag = path;
 73.
 74.
 

Re: need some help =)

2010-11-17 Thread Wyatt Barnett
You should be able to open any of the contrib projects with the free visual 
studio express software or with monodevelop, also free. 

On Nov 17, 2010, at 1:58 PM, asmcad asm...@gmail.com wrote:

 i don't have any ide writing custom analyzer... so i'll stick with 
 SnowballAnalyzer for now.
 
 On 17.11.2010 21:53, Digy wrote:
 UnaccentedWordAnalyzer doesn't make use of stemming.
 
 If you really need it;
 a) SnowballAnalyzer is not good in turkish stemming.
 b) It is better to write a custom analyzer using Zemberek or its .NET
 version NZemberek.
 
 DIGY
 
 
 
 -Original Message-
 From: asmcad [mailto:asm...@gmail.com]
 Sent: Wednesday, November 17, 2010 11:24 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: need some help =)
 
 i need turkish analyzer. my lucene book says i need to use
 SnowballAnalyzer but i can't access to it as
 Lucene.Net.Analysis.Snowball should i install another library to use it?
 
 On 17.11.2010 21:12, Granroth, Neal V. wrote:
 You need to pick a suitable analyzer for use during indexing and for
 queries.  The StandardAnalyzer you are using will most likely break the
 words apart at the non-english characters.
 You might want to consider using the Luke tool to inspect the index you've
 created and see who the words in your documents were split and indexed.
 
 - Neal
 
 -Original Message-
 From: asmcad [mailto:asm...@gmail.com]
 Sent: Wednesday, November 17, 2010 3:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: need some help =)
 
 
 i solved the problem . now i have non-english character problem.
 when i search like something çşğuı(i'm not sure you can see this)
 characters. i don't get any results.
 how can i solve this ?
 
 by the way sorry about the content messing =)
 
 thanks  for the  previous help  =)
 
 On 17.11.2010 20:16, Digy wrote:
  1.
 using System;
  2.
 using System.Collections.Generic;
  3.
 using System.ComponentModel;
  4.
 using System.Data;
  5.
 using System.Drawing;
  6.
 using System.Linq;
  7.
 using System.Text;
  8.
 using System.Windows.Forms;
  9.
 using Lucene.Net;
 10.
 using Lucene.Net.Analysis.Standard;
 11.
 using Lucene.Net.Documents;
 12.
 using Lucene.Net.Index;
 13.
 using Lucene.Net.QueryParsers;
 14.
 using Lucene.Net.Search;
 15.
 using System.IO;
 16.
 17.
 namespace newLucene
 18.
 {
 19.
 public partial class Form1 : Form
 20.
 {
 21.
 public Form1()
 22.
 {
 23.
  InitializeComponent();
 24.
 }
 25.
 26.
 private void buttonIndex_Click(object sender, EventArgs e)
 27.
 {
 28.
  IndexWriter indexwrtr = new
 IndexWriter(@c:\index\,new StandardAnalyzer() , true);
 29.
  Document doc = new Document();
 30.
 string filename = @fer.txt;
 31.
  Lucene.Net.QueryParsers.QueryParser df;
 32.
 33.
 34.
 35.
 System.IO.StreamReader local_StreamReader = new
 System.IO.StreamReader(@C:\z\fer.txt);
 36.
 string  file_text = local_StreamReader.ReadToEnd();
 37.
 38.
 System.Text.UTF8Encoding encoding = new
 System.Text.UTF8Encoding();
 39.
  doc.Add(new
 Field(text,encoding.GetBytes(file_text),Field.Store.YES));
 40.
  doc.Add(new
 Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
 41.
  doc.Add(new Field(title,
 encoding.GetBytes(filename), Field.Store.YES));
 42.
  indexwrtr.AddDocument(doc);
 43.
 44.
  indexwrtr.Optimize();
 45.
  indexwrtr.Close();
 46.
 47.
 }
 48.
 49.
 private void buttonSearch_Click(object sender, EventArgs e)
 50.
 {
 51.
  IndexSearcher indxsearcher = new
 IndexSearcher(@C:\index\);
 52.
 53.
  QueryParser parser = new QueryParser(contents, new
 StandardAnalyzer());
 54.
  Query query = parser.Parse(textBoxQuery.Text);
 55.
 56.
 //Lucene.Net.QueryParsers.QueryParser qp = new
 QueryParser(Lucene.Net.QueryParsers.CharStream
 s).Parse(textBoxQuery.Text);
 57.
  Hits hits = indxsearcher.Search(query);
 58.
 59.
 60.
 for (int i = 0; ihits.Length(); i++)
 61.
 {
 62.
 63.
  Document doc = hits.Doc(i);
 64.
 65.
 66.
 string filename = doc.Get(title);
 67.
 string path = doc.Get(path);
 68.
 string folder = Path.GetDirectoryName(path);
 69.
 70.
 

Lucene-Solr-tests-only-trunk - Build # 1536 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1536/

3 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
KeeperErrorCode = ConnectionLoss for /configs/conf1/protwords.txt

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /configs/conf1/protwords.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:225)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:389)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:411)
at 
org.apache.solr.cloud.AbstractZkTestCase.putConfig(AbstractZkTestCase.java:97)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:87)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:61)


REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:530)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:558)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:156)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:923)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:861)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/collections/testcore/shards/127.0.0.1:1661_solr_testcore
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:348)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:309)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:371)
at 
org.apache.solr.cloud.ZkController.addZkShardsNode(ZkController.java:155)
at org.apache.solr.cloud.ZkController.register(ZkController.java:481)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:521)


REGRESSION:  org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren

Error Message:
KeeperErrorCode = ConnectionLoss for /collections/collection99

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /collections/collection99
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:348)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:309)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:291)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:256)
at 
org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren(ZkSolrClientTest.java:193)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:923)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:861)




Build Log (for compile errors):
[...truncated 9283 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: need some help =)

2010-11-17 Thread asmcad

opps =) not ide. it was  idea :S

On 17.11.2010 22:02, Wyatt Barnett wrote:

You should be able to open any of the contrib projects with the free visual 
studio express software or with monodevelop, also free.

On Nov 17, 2010, at 1:58 PM, asmcadasm...@gmail.com  wrote:


i don't have any ide writing custom analyzer... so i'll stick with 
SnowballAnalyzer for now.

On 17.11.2010 21:53, Digy wrote:

UnaccentedWordAnalyzer doesn't make use of stemming.

If you really need it;
a) SnowballAnalyzer is not good in turkish stemming.
b) It is better to write a custom analyzer using Zemberek or its .NET
version NZemberek.

DIGY



-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 11:24 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)

i need turkish analyzer. my lucene book says i need to use
SnowballAnalyzer but i can't access to it as
Lucene.Net.Analysis.Snowball should i install another library to use it?

On 17.11.2010 21:12, Granroth, Neal V. wrote:

You need to pick a suitable analyzer for use during indexing and for

queries.  The StandardAnalyzer you are using will most likely break the
words apart at the non-english characters.

You might want to consider using the Luke tool to inspect the index you've

created and see who the words in your documents were split and indexed.

- Neal

-Original Message-
From: asmcad [mailto:asm...@gmail.com]
Sent: Wednesday, November 17, 2010 3:06 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: need some help =)


i solved the problem . now i have non-english character problem.
when i search like something çşğuı(i'm not sure you can see this)
characters. i don't get any results.
how can i solve this ?

by the way sorry about the content messing =)

thanks  for the  previous help  =)

On 17.11.2010 20:16, Digy wrote:

  1.
 using System;
  2.
 using System.Collections.Generic;
  3.
 using System.ComponentModel;
  4.
 using System.Data;
  5.
 using System.Drawing;
  6.
 using System.Linq;
  7.
 using System.Text;
  8.
 using System.Windows.Forms;
  9.
 using Lucene.Net;
 10.
 using Lucene.Net.Analysis.Standard;
 11.
 using Lucene.Net.Documents;
 12.
 using Lucene.Net.Index;
 13.
 using Lucene.Net.QueryParsers;
 14.
 using Lucene.Net.Search;
 15.
 using System.IO;
 16.
 17.
 namespace newLucene
 18.
 {
 19.
 public partial class Form1 : Form
 20.
 {
 21.
 public Form1()
 22.
 {
 23.
  InitializeComponent();
 24.
 }
 25.
 26.
 private void buttonIndex_Click(object sender, EventArgs e)
 27.
 {
 28.
  IndexWriter indexwrtr = new
 IndexWriter(@c:\index\,new StandardAnalyzer() , true);
 29.
  Document doc = new Document();
 30.
 string filename = @fer.txt;
 31.
  Lucene.Net.QueryParsers.QueryParser df;
 32.
 33.
 34.
 35.
 System.IO.StreamReader local_StreamReader = new
 System.IO.StreamReader(@C:\z\fer.txt);
 36.
 string  file_text = local_StreamReader.ReadToEnd();
 37.
 38.
 System.Text.UTF8Encoding encoding = new

System.Text.UTF8Encoding();

 39.
  doc.Add(new
 Field(text,encoding.GetBytes(file_text),Field.Store.YES));
 40.
  doc.Add(new
 Field(path,encoding.GetBytes(@C:\z\),Field.Store.YES));
 41.
  doc.Add(new Field(title,
 encoding.GetBytes(filename), Field.Store.YES));
 42.
  indexwrtr.AddDocument(doc);
 43.
 44.
  indexwrtr.Optimize();
 45.
  indexwrtr.Close();
 46.
 47.
 }
 48.
 49.
 private void buttonSearch_Click(object sender, EventArgs e)
 50.
 {
 51.
  IndexSearcher indxsearcher = new
 IndexSearcher(@C:\index\);
 52.
 53.
  QueryParser parser = new QueryParser(contents, new
 StandardAnalyzer());
 54.
  Query query = parser.Parse(textBoxQuery.Text);
 55.
 56.
 //Lucene.Net.QueryParsers.QueryParser qp = new
 QueryParser(Lucene.Net.QueryParsers.CharStream
 s).Parse(textBoxQuery.Text);
 57.
  Hits hits = indxsearcher.Search(query);
 58.
 59.
 60.
 for (int i = 0; i hits.Length(); i++)
 61.
 {
 62.
 63.
  Document doc = hits.Doc(i);
 64.
 65.
 66.
 string filename = doc.Get(title);
 67.
 string path = doc.Get(path);
 68.
 string folder = 

Lucene-Solr-tests-only-3.x - Build # 1512 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/1512/

7 tests failed.
REGRESSION:  
org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication

Error Message:
Jetty/Solr unresponsive

Stack Trace:
java.lang.RuntimeException: Jetty/Solr unresponsive
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.waitForSolr(JettySolrRunner.java:149)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:111)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:103)
at 
org.apache.solr.handler.TestReplicationHandler.createJetty(TestReplicationHandler.java:110)
at 
org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication(TestReplicationHandler.java:260)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:821)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:759)
Caused by: java.io.IOException: Server returned HTTP response code: 500 for 
URL: http://localhost:10355/solr/select?q={!raw+f=junit_test_query}ping
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1269)
at java.net.URL.openStream(URL.java:1029)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.waitForSolr(JettySolrRunner.java:137)


REGRESSION:  org.apache.solr.handler.TestReplicationHandler.testStopPoll

Error Message:
java.net.ConnectException: Operation timed out

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: 
Operation timed out
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
org.apache.solr.handler.TestReplicationHandler.query(TestReplicationHandler.java:142)
at 
org.apache.solr.handler.TestReplicationHandler.clearIndexWithReplication(TestReplicationHandler.java:85)
at 
org.apache.solr.handler.TestReplicationHandler.testStopPoll(TestReplicationHandler.java:285)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:821)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:759)
Caused by: java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at java.net.Socket.connect(Socket.java:495)
at java.net.Socket.init(Socket.java:392)
at java.net.Socket.init(Socket.java:266)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at 
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)


REGRESSION:  
org.apache.solr.handler.TestReplicationHandler.testSnapPullWithMasterUrl

Error Message:
Jetty/Solr unresponsive

Stack Trace:
java.lang.RuntimeException: Jetty/Solr unresponsive
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.waitForSolr(JettySolrRunner.java:149)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:111)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:103)
at 
org.apache.solr.handler.TestReplicationHandler.createJetty(TestReplicationHandler.java:110)
at 
org.apache.solr.handler.TestReplicationHandler.testSnapPullWithMasterUrl(TestReplicationHandler.java:357)
at 

Re: Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil
sure  thanks for the information ...

On Wed, Nov 17, 2010 at 3:47 PM, Erick Erickson erickerick...@gmail.comwrote:

 How does the patch make it to the trunk

 You need to track it and prompt the dev list if you think it's forgotten.
 Basically, when a committer thinks it's ready and valuable s/he will
 commit it to trunk for you.

 But give the committers some time before prompting, they're usually
 up to their ears in other changes

 Best
 Erick

 On Wed, Nov 17, 2010 at 3:30 PM, Jayendra Patil 
 jayendra.patil@gmail.com wrote:

 JIRA - https://issues.apache.org/jira/browse/SOLR-2240
 Patch attached.

 How does the patch make it to the trunk ??? Had submitted a couple of more
 patches SOLR-2156  SOLR-2029, would like them to be included in the
 release.

 Regards,
 Jayendra


 On Wed, Nov 17, 2010 at 2:15 PM, Yonik Seeley yo...@lucidimagination.com
  wrote:

 On Tue, Nov 16, 2010 at 8:57 PM, Jayendra Patil
 jayendra.patil@gmail.com wrote:
  We intend to use schema.url for indexing documents. However, the remote
 urls
  are secured and would need basic authentication to be able access the
  document.
 
  The implementation with stream.file would mean to download the files
 and
  would cause duplicity, whereas stream.body would have indexing
 performance
  issues with the hugh data being transferred over the network.
 
  The current implementation for stream.url in
 ContentStreamBase.URLStream
  does not support authentication.
  But can be easily supported by :-
  1. Passing additional authentication parameter e.g. stream.url.auth
 with the
  encoded authentication value - SolrRequestParsers
  2. Setting Authorization request property for the Connection -
  ContentStreamBase.URLStream
  this.conn.setRequestProperty(Authorization, Basic  +
  encodedauthentication);


 Sounds like a good idea to me.
 Could you open a JIRA issue for this feature, and supply a patch if
 you get to it?

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933182#action_12933182
 ] 

Jason Rutherglen commented on LUCENE-2680:
--

Additionally we need to decide how accounting'll work for
maxBufferedDeleteTerms. We won't have a centralized place to keep track of
the number of terms, and the unique term count in aggregate over many
segments could be a little too time consuming calculate in a method like
doApplyDeletes. An alternative is to maintain a global unique term count,
such that when a term is added, every other per-segment deletes is checked
for that term, and if it's not already been tallied, we increment the number
of buffered terms.

 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

2010-11-17 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933222#action_12933222
 ] 

Trejkaz commented on LUCENE-2348:
-

Finally got around to checking this out today, and it looks good to me.  
Unfortunate how Lucene has changed so much lately that we can't backport this. 
:)  But will just await a release where it appears.


 DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
 readers
 -

 Key: LUCENE-2348
 URL: https://issues.apache.org/jira/browse/LUCENE-2348
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 2.9.2
Reporter: Trejkaz
 Attachments: LUCENE-2348.patch


 DuplicateFilter currently works by building a single doc ID set, without 
 taking into account that getDocIdSet() will be called once per segment and 
 only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933227#action_12933227
 ] 

Michael McCandless commented on LUCENE-2680:


{quote}
I think we may be back tracking here as I had earlier proposed we simply
store each term/query in a map per segment, however I think that was nixed
in favor of last segment + deletes per segment afterwards. We're not
worried about the cost of storing pending deletes in a map per segment
anymore?
{quote}

OK sorry now I remember.

Hmm but, my objection then was to carrying all deletes backward to all
segments?

Whereas now I think what we can do is only record the deletions that
were added when that segment was a RAM buffer, in its pending deletes
map?  This should be fine, since we aren't storing a single deletion
in multiple places (well, until DWPTs anyway).  It's just that on
applying deletes to a segment because it's about to be merged we have
to do a merge sort of the buffered deletes all future segments.

BTW it could also be possible to not necessarily apply deletes when a
segment is merged; eg if there are few enough deletes it may not be
worthwhile.  But we can leave that to another issue.

{quote}
Additionally we need to decide how accounting'll work for
maxBufferedDeleteTerms. We won't have a centralized place to keep track of
the number of terms, and the unique term count in aggregate over many
segments could be a little too time consuming calculate in a method like
doApplyDeletes. An alternative is to maintain a global unique term count,
such that when a term is added, every other per-segment deletes is checked
for that term, and if it's not already been tallied, we increment the number
of buffered terms.
{quote}

Maybe we should change the definition to be total number of pending
delete term/queries?  (Ie, not dedup'd across segments).  This seems
reasonable since w/ this new approach the RAM consumed is in
proportion to that total number and not to dedup'd count?


 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-3.x - Build # 184 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/184/

All tests passed

Build Log (for compile errors):
[...truncated 21395 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2242) Get distinct count of names for a facet field

2010-11-17 Thread Bill Bell (JIRA)
Get distinct count of names for a facet field
-

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0


See SOLR-236.

Need ability to get count back for the unique facets for grouping (field 
collapsing) instead of returning the facets. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 1545 - Failure

2010-11-17 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/1545/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:923)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:861)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:446)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8749 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933305#action_12933305
 ] 

Jason Rutherglen commented on LUCENE-2680:
--

Flush deletes equals true means that all deletes are applied, however when it's 
false, that means we're moving the pending deletes into the newly flushed 
segment, as is, with no docId-upto remapping.  

 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933306#action_12933306
 ] 

Jason Rutherglen commented on LUCENE-2680:
--

We can upgrade to an int[] from an ArrayListInteger for the aborted docs.

 Improve how IndexWriter flushes deletes against existing segments
 -

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, 
 LUCENE-2680.patch, LUCENE-2680.patch


 IndexWriter buffers up all deletes (by Term and Query) and only
 applies them if 1) commit or NRT getReader() is called, or 2) a merge
 is about to kickoff.
 We do this because, for a large index, it's very costly to open a
 SegmentReader for every segment in the index.  So we defer as long as
 we can.  We do it just before merge so that the merge can eliminate
 the deleted docs.
 But, most merges are small, yet in a big index we apply deletes to all
 of the segments, which is really very wasteful.
 Instead, we should only apply the buffered deletes to the segments
 that are about to be merged, and keep the buffer around for the
 remaining segments.
 I think it's not so hard to do; we'd have to have generations of
 pending deletions, because the newly merged segment doesn't need the
 same buffered deletions applied again.  So every time a merge kicks
 off, we pinch off the current set of buffered deletions, open a new
 set (the next generation), and record which segment was created as of
 which generation.
 This should be a very sizable gain for large indices that mix
 deletes, though, less so in flex since opening the terms index is much
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2243) Group Querys maybe return docList of 0 results

2010-11-17 Thread tom liu (JIRA)
Group Querys maybe return docList of 0 results
--

 Key: SOLR-2243
 URL: https://issues.apache.org/jira/browse/SOLR-2243
 Project: Solr
  Issue Type: Wish
  Components: search
 Environment: JDK1.6/Tomcat6
Reporter: tom liu


i wish have bellow results:
{noformat}
lst name=grouped
   lst name=countrycode
   int name=matches1411/int
   arr name=groups
 lst
str name=groupValueunit/str
result name=doclist numFound=921 start=0/
 /lst
 lst
str name=groupValuechina/str
result name=doclist numFound=139 start=0/
 /lst
   /arr
   /lst
/lst
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2243) Group Querys maybe return docList of 0 results

2010-11-17 Thread tom liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom liu updated SOLR-2243:
--

Attachment: SolrIndexSearcher.patch

i found:
# set group.limit=0
# in solrIndexSearcher, i give value 1 to Collector constrution

for example:
{noformat}
Phase2GroupCollector collector = new Phase2GroupCollector(
   (TopGroupCollector)gc.collector, gc.groupBy, gc.context, 
collectorSort, 
   gc.docsPerGroup == 0? 1 : groupCommand.docsPerGroup, 
   needScores);
{noformat}

 Group Querys maybe return docList of 0 results
 --

 Key: SOLR-2243
 URL: https://issues.apache.org/jira/browse/SOLR-2243
 Project: Solr
  Issue Type: Wish
  Components: search
 Environment: JDK1.6/Tomcat6
Reporter: tom liu
 Attachments: SolrIndexSearcher.patch


 i wish have bellow results:
 {noformat}
 lst name=grouped
lst name=countrycode
int name=matches1411/int
arr name=groups
  lst
 str name=groupValueunit/str
 result name=doclist numFound=921 start=0/
  /lst
  lst
 str name=groupValuechina/str
 result name=doclist numFound=139 start=0/
  /lst
/arr
/lst
 /lst
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2205) Grouping performance improvements

2010-11-17 Thread tom liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1295#action_1295
 ] 

tom liu commented on SOLR-2205:
---

Now, group Search do not support distributed query.

anyone else, has already been meet this?

 Grouping performance improvements
 -

 Key: SOLR-2205
 URL: https://issues.apache.org/jira/browse/SOLR-2205
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 4.0
Reporter: Martijn van Groningen
 Fix For: 4.0

 Attachments: SOLR-2205.patch, SOLR-2205.patch


 This issue is dedicated to the performance of the grouping functionality.
 I've noticed that the code is not really performing on large indexes. Doing a 
 search (q=*:*) with grouping on an index from around 5M documents took around 
 one second on my local development machine. We had to support grouping on an 
 index that holds around 50M documents per machine, so we made some changes 
 and were able to happily serve that amount of documents. Patch will follow 
 soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene project announcement

2010-11-17 Thread Troy Howard
Neal,

As you said, If you're developing at the concept level the specific
language you use becomes unimportant. .

This is exactly why we feel that working on this in C# is not a
problem. We feel that the language should not impede our ability to
contribute. If we develop some interesting or valuable concepts in C#
those could be ported back to Java for inclusion in the Java
implementation of Lucene.

From an implementation standpoint, we feel that the code should
perform and integrate as effectively as possible into the runtime it's
in. Unfortunately there's no known software runtime that executes
concepts. They execute code written in a specific language. The
details of how that code executes and integrates into applications
directly effects it's performance and usability.

It's a disservice to the concept of Lucene to translate it literally,
if doing so makes it less performant or less usable.

Using human language as an example:

Consider the Chinese name for China:

中国 (Zhong Guo)

translated literally it means Middle Kingdom.

Imagine you were translating and important philosophical document from
Chinese to English. Would you translate Zhong Guo as Middle
Kingdom or as China?

Suppose someone had asked the original philosopher to write all his
ideas in English to start because English is the language of
philosophy.. It's what all the eminent philosophers use. Perhaps he
would never contribute his ideas at all, since writing them down in
English is too great a barrier. Maybe he would write them down, but
write them down in a way which made them seem absurd or have less of
an impact.. In other words.. Miss the meaning, even though he'd
translated literally.

Either way, it would be less ideal than simply writing them in Chinese
to start, as that's what would be most natural for our imaginary
philosopher. The burden of translation from Chinese to English could
then be performed by an expert in translation, who would, undoubtedly,
translate the meaning conceptually, not the words syntactically.

Thanks,
Troy


On Wed, Nov 17, 2010 at 12:16 PM, Granroth, Neal V.
neal.granr...@thermofisher.com wrote:
 Is Java Lucene grown up ?  Look at how much discussion it took to determine 
 how to get Java out of the name :)

 The discussion about advancing the algorithm in C#/.NET seems to be missing 
 the point.  If you're developing at the concept level the specific language 
 you use becomes unimportant.  However as most of the concept developers 
 apparently find Java convenient; others wanting to participate at the concept 
 level would find it more beneficial to join that brain-pool instead of 
 diluting the effort by starting up elsewhere.


 - Neal

 -Original Message-
 From: George Aroush [mailto:geo...@aroush.net]
 Sent: Tuesday, November 16, 2010 10:55 PM
 To: lucene-net-...@lucene.apache.org
 Cc: dev@lucene.apache.org
 Subject: RE: Lucene project announcement

 This topic has been coming back again and again which I have tried to
 address multiple times, so let me try again.

 1) Java Lucene started years before the first C# version (4+ years if I get
 my history right), thus it defined and has been the definer of the
 technology and the API.  It is the established leader, and everyone else is
 just a follower.

 2) Lucene.Net is no were mature as Java Lucene, never got established
 itself, or had a rich development community -- thus why we are here today.

 3) If and only if, the community of Lucene.Net (or Lucene over at
 codeplex.com) manages to proves itself to the level of Java Lucene, only
 then such a community will have the voice to influence folks over at Java
 Lucene.  Only then you will see the two community discussing search engine
 vs. port issues or the state of Lucene.Net.

 If you look in my previous posts, I have pointed those out.  We must first:

 1) Be in par with Java Lucene release and keep up with commit-per-commit
 port.

 2) Prove Lucene.Net is a grownup project with followers and a healthy
 community (just like Java Lucene).

 If we don't achieve the above, folks over at Java Lucene will not take us
 seriously, and thus we can't influence them.

 -- George

 -Original Message-
 From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com]
 Sent: Friday, November 12, 2010 10:36 AM
 To: lucene-net-...@lucene.apache.org
 Cc: dev@lucene.apache.org
 Subject: RE: Lucene project announcement

 Paul, et al,

        Paul, God bless you.  This is probably the most rational, practical
 perspective I've seen on the whole matter since the debacle started.

        While Lucene started off as a Java project, it's massive success
 indicates that the concepts around it are very desirable by developers in
 other technologies, and that the Java product isn't being translated well
 into those technology stacks.

        That's not a slight against those who have contributed to this point
 to try and keep the .NET version in line with the Java one (despite me
 thinking