RE: probable sore topic... Maven build?

2011-04-21 Thread Steven A Rowe
Hi Paul,

As Grant mentioned, there is a full complement of Maven POMs stored under the 
top-level dev-tools/ directory.  You can populate the source tree with POMs by 
running 'ant get-maven-poms' from the top level.  Before the Maven build will 
function fully, run 'mvn -N -P bootstrap install' at the top level to install 
the non-Mavenized dependencies in your local repository.  At this point the 
Maven build should work.  Please report any problems you find.

Maven artifacts are built nightly via 'ant generate-maven-artifacts' by 
Jenkins.  The Maven build is also exercised nightly via 'mvn install' from the 
top level.  See this wiki page for pointers to the nightly Lucene/Solr Maven 
builds, the scripts that run them, and the published maven artifacts:

http://wiki.apache.org/solr/NightlyBuilds

Feel free to ask more questions.

Steve

> -Original Message-
> From: Paul R. Brown [mailto:p...@mult.ifario.us]
> Sent: Thursday, April 21, 2011 4:38 PM
> To: dev@lucene.apache.org
> Subject: probable sore topic... Maven build?
> 
> 
> Hi, Dev@ --
> 
> I'm sure that there have been plenty of discussions previously about the
> current build and Maven artifacts; I am not looking to restart those
> discussions. That said, I am going to make a Maven-centric build of the
> Lucene/Solr universe, mostly because I need to build Maven artifacts from
> trunk on a regular basis, and I'm both happy to share with other folks if
> they're interested or build on existing efforts if someone cares to point
> me in the right direction.
> 
> Thanks in advance.
> 
> -- Paul
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3040) analysis consumers should use reusable tokenstreams

2011-04-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3040:


Attachment: LUCENE-3040.patch

> analysis consumers should use reusable tokenstreams
> ---
>
> Key: LUCENE-3040
> URL: https://issues.apache.org/jira/browse/LUCENE-3040
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3040.patch
>
>
> Some analysis consumers (highlighter, more like this, memory index, contrib 
> queryparser, ...) are using Analyzer.tokenStream but should be using 
> Analyzer.reusableTokenStream instead for better performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3040) analysis consumers should use reusable tokenstreams

2011-04-21 Thread Robert Muir (JIRA)
analysis consumers should use reusable tokenstreams
---

 Key: LUCENE-3040
 URL: https://issues.apache.org/jira/browse/LUCENE-3040
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.2, 4.0
 Attachments: LUCENE-3040.patch

Some analysis consumers (highlighter, more like this, memory index, contrib 
queryparser, ...) are using Analyzer.tokenStream but should be using 
Analyzer.reusableTokenStream instead for better performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name

2011-04-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023108#comment-13023108
 ] 

Bill Bell commented on SOLR-2471:
-

This is not a question. It is a bug.

Here is the problem:

I want to do two FQs (filtered queries) to reduce the results. Each FQ is a 
subquery. For example:

fq={!type=dismax qt=dismaxname v="bill"}
fq={!type=dismax qt=anotherfield v="doctor"}

What happens is the QT (solrconfig) dismaxname is used for both FQs. We need it 
to support 2 different QT names.

Is this clear? If not, I will send an example that show breakage.


> Localparams not working with 2 fq parameters using qt=name
> --
>
> Key: SOLR-2471
> URL: https://issues.apache.org/jira/browse/SOLR-2471
> Project: Solr
>  Issue Type: Bug
>Reporter: Bill Bell
>
> We are having a problem with the following query. If we have two localparams 
> (using fq) and use QT= it does not work.
> This does not find any results:
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qt=namespec v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qt=namespec v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname 
> v=$qname}"&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> This works okay. It returns a few results.
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qf=$qqf v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qf=$qqf  v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname v=$qname}" &qqf=specialties_ngram^1.0 
> specialties_search^2.0&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> We would like to use a QT for both terms but it seems there is some kind of 
> bug when using two localparams and dismax filters with QT.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name

2011-04-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023110#comment-13023110
 ] 

Bill Bell commented on SOLR-2471:
-

Hoss,

Since they are LocalParams, why can't we have TWO QTs? One for each subquery? 
Based on what you are saying, this would be a new feature request...



> Localparams not working with 2 fq parameters using qt=name
> --
>
> Key: SOLR-2471
> URL: https://issues.apache.org/jira/browse/SOLR-2471
> Project: Solr
>  Issue Type: Bug
>Reporter: Bill Bell
>
> We are having a problem with the following query. If we have two localparams 
> (using fq) and use QT= it does not work.
> This does not find any results:
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qt=namespec v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qt=namespec v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname 
> v=$qname}"&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> This works okay. It returns a few results.
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qf=$qqf v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qf=$qqf  v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname v=$qname}" &qqf=specialties_ngram^1.0 
> specialties_search^2.0&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> We would like to use a QT for both terms but it seems there is some kind of 
> bug when using two localparams and dismax filters with QT.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7329 - Failure

2011-04-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7329/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010)




Build Log (for compile errors):
[...truncated 5276 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-trunk - Build # 1537 - Failure

2011-04-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1537/

21 tests failed.
REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/nrtopenfiles.4311211294863747903/_bx.tvd
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/nrtopenfiles.4311211294863747903/_bx.tvd
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:91)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:374)
at org.apache.lucene.store.Directory.openInput(Directory.java:122)
at 
org.apache.lucene.index.TermVectorsReader.(TermVectorsReader.java:83)
at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:236)
at 
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:515)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:611)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:560)
at 
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:172)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360)
at 
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:419)
at 
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:432)
at 
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:392)
at 
org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:213)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)


REGRESSION:  org.apache.lucene.index.TestOmitNorms.testOmitNormsCombos

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test8730544420518378026tmp/_i_0.skp
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test8730544420518378026tmp/_i_0.skp
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:448)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:312)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:348)
at 
org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:139)
at 
org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:106)
at 
org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec.fieldsConsumer(MockFixedIntBlockCodec.java:114)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:64)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:54)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:78)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:103)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:65)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:55)
at 
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2497)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2462)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:101)
at 
org.apache.lucene.index.TestOmitNorms.getNorms(TestOmitNorms.java:285)
at 
org.apache.lucene.index.TestOmitNorms.testOmitNormsCombos(TestOmitNorms.java:244)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunn

Re: Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 8:15 PM, Kiko Aumond  wrote:
> Yes, this is a CSV Loader.  This looks like one of those cases where there
> are many ways to handle 90% of the requirements but none that solves 100% of
> the problem. Which is why the CSV loader also almost solves the problem, but
> not quite.
>
>  We're not using solr as a web app, just using the embedded server, which is
> why we can't use curl and hence CSVLoader.  So this is a purely command-line
> driven application that runs against an embedded Solr server, no web
> containers,  for performance reasons.

But I've already pointed out that if you were running the solr server,
you could easily have just streamed the CSV directly from disk (even
though the time savings are normally in the 1-2% range).

Regardless, even if you're using embedded, you should still be able to
pass "stream.url=file://my_local_file" via something like
DirectSolrConnection or EmbeddedSolrServer and have the standard
CSVLoader stream directly from the file.  Of course if the CSV files
are of any sufficient size, it's not going to matter if you kick off
the stream via HTTP or embedded.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Kiko Aumond
Yes, this is a CSV Loader.  This looks like one of those cases where there
are many ways to handle 90% of the requirements but none that solves 100% of
the problem. Which is why the CSV loader also almost solves the problem, but
not quite.

 We're not using solr as a web app, just using the embedded server, which is
why we can't use curl and hence CSVLoader.  So this is a purely command-line
driven application that runs against an embedded Solr server, no web
containers,  for performance reasons.

On Thu, Apr 21, 2011 at 4:47 PM, Yonik Seeley wrote:

> On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond  wrote:
> > Yes, I've seen that page, but I went a bit beyond the material there, as
> the
> > code I wrote is able to set parameters such as separators, encapsulators
> and
> > the index columns,  whether to split parameters, auto-commit as well as
> the
> > ability to do incremental or full index reloads.
>
> Is this a CSV loader?
> If so, did you know the CSV loader (and other data loaders) have the
> option to bypass HTTP also and stream directly from a local file (or
> other URL)?
>
> > Also, from what I've seen in DirectSolrConnection (version 1.4.1), you
> have
> > to supply the document body as a String.  We want to avoid havindgto load
> > the entire document into memory, which is why we load the files into
> > ContentStream objects and pass them to the embedded Solr server (I am
> > assuming  ContentStream actually streams the file as its name suggests
> > instead of trying to load it into memory).  The utility I wrote gets a
> path,
> > a Regex expression for all the files to be loaded, as well as the
> parameters
> > mentioned above and it does either a full or incremental upload of
> multiple
> > files with a single command.
> >
> > We run a very high load application with SOLR in the back end that
> requires
> > that we use the Embedded solr server to eliminate the network round-trip.
> > Even a small incremental gain in performance is important for us.
>
> Eliminating the network round-trip is certainly important for good
> bulk indexing performance.  Luckily you don't have to
> embed to do that.  You can use multiple threads (say 16 for a 4 core
> server) that essentially covers up
> any round-trip latency (use persistent connections though!  or use
> SolrJ which does by default),
> or you can use the StreamingUpdateSolrServer that eliminates
> round-trip network delays
> by streaming documents over multiple already open connections.
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>


Re: Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond  wrote:
> Yes, I've seen that page, but I went a bit beyond the material there, as the
> code I wrote is able to set parameters such as separators, encapsulators and
> the index columns,  whether to split parameters, auto-commit as well as the
> ability to do incremental or full index reloads.

Is this a CSV loader?
If so, did you know the CSV loader (and other data loaders) have the
option to bypass HTTP also and stream directly from a local file (or
other URL)?

> Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have
> to supply the document body as a String.  We want to avoid havindgto load
> the entire document into memory, which is why we load the files into
> ContentStream objects and pass them to the embedded Solr server (I am
> assuming  ContentStream actually streams the file as its name suggests
> instead of trying to load it into memory).  The utility I wrote gets a path,
> a Regex expression for all the files to be loaded, as well as the parameters
> mentioned above and it does either a full or incremental upload of multiple
> files with a single command.
>
> We run a very high load application with SOLR in the back end that requires
> that we use the Embedded solr server to eliminate the network round-trip.
> Even a small incremental gain in performance is important for us.

Eliminating the network round-trip is certainly important for good
bulk indexing performance.  Luckily you don't have to
embed to do that.  You can use multiple threads (say 16 for a 4 core
server) that essentially covers up
any round-trip latency (use persistent connections though!  or use
SolrJ which does by default),
or you can use the StreamingUpdateSolrServer that eliminates
round-trip network delays
by streaming documents over multiple already open connections.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2268) Add support for Point in Polygon searches

2011-04-21 Thread Alexander Kanarsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022980#comment-13022980
 ] 

Alexander Kanarsky commented on SOLR-2268:
--

Robert, 

in addition to David's work (SOLR-2155) you can also try my polygon extension 
for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a 
standalone Solr module that works with Solr 1.4.x, you can get it here: 

http://www.jteam.nl/products/spatialsolrplugin.html 

It allows Radius search for the lat,lon documents. 

The polygon/polyline extension for SSP 2.0 in addition to this allows polygon 
search. It is located here: 

http://sourceforge.net/projects/ssplex/files/

There has some limitations (for example, plane geometry is used) but it may 
work just well for you, depending on your situation.

> Add support for Point in Polygon searches
> -
>
> Key: SOLR-2268
> URL: https://issues.apache.org/jira/browse/SOLR-2268
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
> Attachments: SOLR-2268.patch
>
>
> In spatial applications, it is common to ask whether a point is inside of a 
> polygon.  Solr could support two forms of this: 
> # A field contains a polygon and the user supplies a point.  If it does, the 
> doc is returned.  
> # A document contains a point and the user supplies a polygon.  If the point 
> is in the polygon, return the document
> With both of these case, it would be good to support the negative assertion, 
> too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Kiko Aumond
Thank you, Yonik.

Yes, I've seen that page, but I went a bit beyond the material there, as the
code I wrote is able to set parameters such as separators, encapsulators and
the index columns,  whether to split parameters, auto-commit as well as the
ability to do incremental or full index reloads.

Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have
to supply the document body as a String.  We want to avoid havindgto load
the entire document into memory, which is why we load the files into
ContentStream objects and pass them to the embedded Solr server (I am
assuming  ContentStream actually streams the file as its name suggests
instead of trying to load it into memory).  The utility I wrote gets a path,
a Regex expression for all the files to be loaded, as well as the parameters
mentioned above and it does either a full or incremental upload of multiple
files with a single command.

We run a very high load application with SOLR in the back end that requires
that we use the Embedded solr server to eliminate the network round-trip.
Even a small incremental gain in performance is important for us.

On Thu, Apr 21, 2011 at 4:02 PM, Yonik Seeley wrote:

> On Thu, Apr 21, 2011 at 6:26 PM, Kiko Aumond  wrote:
> > Hi
> >
> > I am new to the list and relatively new to SOLR.  I am working on a tool
> for
> > updating indexes directly through EmbeddedSolrServer thus eliminating the
> > need for sending potentially large documents over HTTP.
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
> And also DirectSolrConnection
>
> It's generally discouraged as a premature optimization that normally
> gains you only a few percent increase in performance.
>
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>


Re: Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 6:26 PM, Kiko Aumond  wrote:
> Hi
>
> I am new to the list and relatively new to SOLR.  I am working on a tool for
> updating indexes directly through EmbeddedSolrServer thus eliminating the
> need for sending potentially large documents over HTTP.

http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
And also DirectSolrConnection

It's generally discouraged as a premature optimization that normally
gains you only a few percent increase in performance.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Stand-alone Index updating using EmbeddedSolrServer

2011-04-21 Thread Kiko Aumond
Hi

I am new to the list and relatively new to SOLR.  I am working on a tool for
updating indexes directly through EmbeddedSolrServer thus eliminating the
need for sending potentially large documents over HTTP. This tool could also
be easily modified to allow for command-line querying, also using the
embedded server.

My questions are :
1) Does this functionality already exist and my google searches didn't find
it?

2) If this functionality is not yet available for embedded servers, would
there be interest in this code as a contribution to the Solr project and how
should I go about submitting the code?

Thanks
Kiko


Re: probable sore topic... Maven build?

2011-04-21 Thread Grant Ingersoll
Have a look at the dev-tools area under the trunk checkout.  From there, you 
can use many maven tools if you would like.

On Apr 21, 2011, at 10:38 PM, Paul R. Brown wrote:

> 
> Hi, Dev@ --
> 
> I'm sure that there have been plenty of discussions previously about the 
> current build and Maven artifacts; I am not looking to restart those 
> discussions. That said, I am going to make a Maven-centric build of the 
> Lucene/Solr universe, mostly because I need to build Maven artifacts from 
> trunk on a regular basis, and I'm both happy to share with other folks if 
> they're interested or build on existing efforts if someone cares to point me 
> in the right direction.
> 
> Thanks in advance.
> 
> -- Paul 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



probable sore topic... Maven build?

2011-04-21 Thread Paul R. Brown

Hi, Dev@ --

I'm sure that there have been plenty of discussions previously about the 
current build and Maven artifacts; I am not looking to restart those 
discussions. That said, I am going to make a Maven-centric build of the 
Lucene/Solr universe, mostly because I need to build Maven artifacts from trunk 
on a regular basis, and I'm both happy to share with other folks if they're 
interested or build on existing efforts if someone cares to point me in the 
right direction.

Thanks in advance.

-- Paul 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] Joining the Revolution

2011-04-21 Thread Wyatt Barnett
Comerade?

Seems like there is a Lucene conference:
http://www.lucenerevolution.com/ ; Anyone planning on attending?


Re: Indexing data with Trade Mark Symbol

2011-04-21 Thread Paul R. Brown

I've had no issues with indexing string with non-ASCII characters, but your web 
container may be interfering with what you're trying to do; see:

http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

-- Paul
On Thursday, April 7, 2011 at 1:02 AM, mechravi25 wrote:
Hi, 
>  Has anyone indexed the data with Trade Mark symbol??...when i tried to
> index, the data appears as below... I want to see the Indexed data with TM
> symbol 
> 
> Indexed Data: 
>  79797 - Siebel Research– AI Fund, 
>  79797 - Siebel Research– AI Fund,l 
> 
> 
> Original Data: 
> 79797 - Siebel Research™ AI Fund, 
> 
> 
> Please help me to resolve this 
> 
> Regards, 
> Ravi 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-data-with-Trade-Mark-Symbol-tp2789391p2789391.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2272) Join

2011-04-21 Thread Gerd Bremer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022805#comment-13022805
 ] 

Gerd Bremer commented on SOLR-2272:
---

Is it possible to sort the join query result?

// first class of documents with refid and pagecount fields; 
// a refid field maps to an id field in the second class of documents (1->100, 
2->101)
doc1:

id:1
refid:100
pagecount:35

doc2:
- 
id:2
refid:101
pagecount:45

// second class of documents with text field
doc100:
--
id:100
text:hello world

doc101:
--
id:101
text: goodbye

Now I would like to select the documents from the first class with field 
pagecount sorted descandant, that is {doc2, doc1} and return the mapped 
documents with text in the same order that is {doc101,doc100}. Is this possible 
with join? I'm looking for an alternative to partial update and this join looks 
promising if I can sort and get the mapped result in the same order.

> Join
> 
>
> Key: SOLR-2272
> URL: https://issues.apache.org/jira/browse/SOLR-2272
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-2272.patch, SOLR-2272.patch
>
>
> Limited join functionality for Solr, mapping one set of IDs matching a query 
> to another set of IDs, based on the indexed tokens of the fields.
> Example:
> fq={!join  from=parent_ptr to:parent_id}child_doc:query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-21 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-3018:
--

Attachment: LUCENE-3018.patch

Changed the library folder to:
{noformat} ant_lib {noformat}

Updated the overview.html file too. 

> Lucene Native Directory implementation need automated build
> ---
>
> Key: LUCENE-3018
> URL: https://issues.apache.org/jira/browse/LUCENE-3018
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
> LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
> cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar
>
>
> Currently the native directory impl in contrib/misc require manual action to 
> compile the c code (partially) documented in 
>  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
> yet it would be nice if we had an ant task and documentation for all 
> platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022724#comment-13022724
 ] 

Simon Willnauer commented on LUCENE-3018:
-

bq. In my opinion, the folder name of this utility lib should maybe named 
different, because lib/ is used for runtime libs and not compile-time 
dependencies. Any comments on this? Simon?

I agree, maybe we should call it ant_lib ?

{quote}We require all libs in Lucene to be named with version number. Of course 
the lib should contain it in its manifest, but the name should also contain it. 
Cpptasks is a utility lib, so its not so important for this case, but for 
consistency, all other libs that will be copied by the user to his classpath 
when using Lucene this is important to make sure, that the user can manage the 
dependencies if he has conflicting version numbers.
{quote}

Varun, can you change it please?

> Lucene Native Directory implementation need automated build
> ---
>
> Key: LUCENE-3018
> URL: https://issues.apache.org/jira/browse/LUCENE-3018
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
> LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
> cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar
>
>
> Currently the native directory impl in contrib/misc require manual action to 
> compile the c code (partially) documented in 
>  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
> yet it would be nice if we had an ant task and documentation for all 
> platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1709) Distributed Date and Range Faceting

2011-04-21 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022697#comment-13022697
 ] 

Peter Sturge commented on SOLR-1709:


Yes, the deprecation story makes sense.

Regarding SOLR-1729, I'm pretty sure this already works for 3x (it was 
originally created on/for the 3x branch). I guess Yonik's NOW changes were 
destined for trunk, but I've been using the current SOLR-1729 patch on 3x 
branch and is working fine in production environments.

Thanks
Peter


> Distributed Date and Range Faceting
> ---
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: FacetComponent.java, FacetComponent.java, 
> ResponseBuilder.java, SOLR-1709.patch, 
> SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Jenkins slave out of disk

2011-04-21 Thread Niklas Gustavsson
On Thu, Apr 21, 2011 at 8:42 AM, Uwe Schindler  wrote:
> [root@lucene /home/hudson/hudson-slave/workspace]# df -h
> Filesystem                             Size    Used   Avail Capacity
> Mounted on
> zroot/jails/lucene.zones.apache.org     69G     53G     17G    76%    /
>
> I think the master is this time out of disk space.

The disk space was cleared (or at least that's what Jenkins thinks).
The slave is now running again.

/niklas

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene Jenkins slave out of disk

2011-04-21 Thread Niklas Gustavsson
Hi

The box running the Jenkins slave for Lucene is out of disk space
again. Please check.

/niklas

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Lucene Jenkins slave out of disk

2011-04-21 Thread Uwe Schindler
OK,

I was not aware that you took action, too. Because there is the message
about Jenkins shutdown, I was assuming that you are fixing the master.

About the Lucene slave, the problem we have is: Our tests need lots of disk
space for large indexes. Before the build, the workspace is complete cleared
(at least for the nightly builds) and for the half-hourly builds it is
svn-upped and then cleaned by ANT (ant clean). But at the end of the build
the build script does not call "ant clean" again, because otherwise we would
lost the test results. I am working on adding an additional task to our ant
build scripts that will only clean up the test data that can be called after
the build finished. This may clean up unneeded disk space after build so
later running builds can reuse the space.

I will talk to the other committers who know better where the temporary test
files lie.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Niklas Gustavsson [mailto:nik...@protocol7.com]
> Sent: Thursday, April 21, 2011 8:57 AM
> To: bui...@apache.org
> Cc: dev@lucene.apache.org
> Subject: Re: Lucene Jenkins slave out of disk
> 
> On Thu, Apr 21, 2011 at 8:42 AM, Uwe Schindler  wrote:
> > [root@lucene /home/hudson/hudson-slave/workspace]# df -h
> Filesystem
> > Size    Used   Avail Capacity Mounted on
> > zroot/jails/lucene.zones.apache.org     69G     53G     17G    76%
> > /
> >
> > I think the master is this time out of disk space.
> 
> The disk space was cleared (or at least that's what Jenkins thinks).
> The slave is now running again.
> 
> /niklas


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org