Re: solr nutch url indexing

2009-08-26 Thread last...@gmail.com

Uri Boness wrote:
Well... yes, it's a tool the Nutch ships with. It also ships with an 
example Solr schema which you can use. 


hi,
is there any documentation to understand what going in the schema ?

requestHandler name=/nutch class=solr.SearchHandler 
   lst name=defaults
   str name=defTypedismax/str
   str name=echoParamsexplicit/str
   float name=tie0.01/float
   str name=qfcontent0.5 anchor1.0 title5.2/str
   str name=pfcontent0.5 anchor1.5 title5.2 site1.5/str
   str name=flurl/str
   str name=mm2lt;-1 5lt;-2 6lt;90%/str
   int name=ps100/int
   bool hl=true/
   str name=q.alt*:*/str
   str name=hl.fltitle url content/str
   str name=f.title.hl.fragsize0/str
   str name=f.title.hl.alternateFieldtitle/str
   str name=f.url.hl.fragsize0/str
   str name=f.url.hl.alternateFieldurl/str
   str name=f.content.hl.fragmenterregex/str
   /lst
/requestHandler


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access 
the JVM???

Regards
Bern


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, 26 August 2009 5:10 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 We have an encoding problem with our solr application. That is, non-ASCII
 chars displaying fine in SOLR, but in googledegook in our application .

 Our tomcat server.xml file already contains URIencoding=UTF-8 under the
 relevant connector.

 A google search reveals that I should set the encoding for the JVM, but
 have no idea how to do this. I'm running Windows, and there is no tomcat
 process in my Windows Services.


Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I
 access the JVM???


When you execute the java executable, just add -Dfile.encoding=UTF-8 as a
command line argument to the executable.

How are you consuming Solr? You mentioned there is no tomcat, is your solr
client a desktop java application?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Exact word search

2009-08-26 Thread Shalin Shekhar Mangar
On Tue, Aug 25, 2009 at 10:40 AM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 Can any one helpe me with the below scenario?.

 Scenario 1:

 Assume that I give Google as input string
 i am using Carrot with Solr
 Carrot is for front end display purpose


It seems like Carrot is the one making the queries to Solr? In that case,
this question may be better suited for carrot users/developers.



 the issue is
 Assuming i give BHASKAR as input string
 It should give me search results pertaining to BHASKAR only.
  Select * from MASTER where name =Bhaskar;
  Example:It should not display search results as ChandarBhaskar or
  BhaskarC.
  Should display Bhaskar only.



That is easy with Solr, make a query like field-name:Bhaskar. Make sure
that field name is not tokenized i.e. string type in schema.xml



 Scenario 2:
  Select * from MASTER where name like %BHASKAR%;
  It should display records containing the word BHASKAR
  Ex: Bhaskar
 ChandarBhaskar
  BhaskarC
  Bhaskarabc


Leading wildcards are not supported. However there are alternate ways of
doing it.

Create two fields, keep one as a normal string type and use a
KeywordTokenizer and ReverseFilter on the other. Make one field a copyField
of the other. Perform a prefix search on both fields.

-- 
Regards,
Shalin Shekhar Mangar.


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Thanks for your quick reply, Shalin.

Tomcat is running on my Windows machine, but does not appear in Windows 
Services (as I was expecting it should ... am I wrong?). I'm running it from a 
startup.bat on my desktop - see below. Do I add the Dfile line to the 
startup.bat?

SOLR is part of the repository software that we are running.

Thanks!

BERN

Startup.bat -
@echo off
if %OS% == Windows_NT setlocal
rem ---
rem Start script for the CATALINA Server
rem
rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $
rem ---

rem Guess CATALINA_HOME if not defined
set CURRENT_DIR=%cd%
if not %CATALINA_HOME% ==  goto gotHome
set CATALINA_HOME=%CURRENT_DIR%
if exist %CATALINA_HOME%\bin\catalina.bat goto okHome
cd ..
set CATALINA_HOME=%cd%
cd %CURRENT_DIR%
:gotHome
if exist %CATALINA_HOME%\bin\catalina.bat goto okHome
echo The CATALINA_HOME environment variable is not defined correctly
echo This environment variable is needed to run this program
goto end
:okHome

set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat

rem Check that target executable exists
if exist %EXECUTABLE% goto okExec
echo Cannot find %EXECUTABLE%
echo This file is needed to run this program
goto end
:okExec

rem Get remaining unshifted command line arguments and save them in the
set CMD_LINE_ARGS=
:setArgs
if %1== goto doneSetArgs
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setArgs
:doneSetArgs

call %EXECUTABLE% start %CMD_LINE_ARGS%

:end





Re: shingle filter

2009-08-26 Thread Shalin Shekhar Mangar
On Tue, Aug 25, 2009 at 4:24 AM, Joe Calderon calderon@gmail.comwrote:

 hello *, im currently faceting on a shingled field to obtain popular
 phrases and its working well, however ide like to limit the number of
 shingles that get created, the solr.ShingleFilterFactory supports
 maxShingleSize, can it be made to support a minimum as well? can
 someone point me in the right direction?


There is only maxShingleSize right now. The other configurable attribute is
outputUnigrams which controls whether or not unigrams may be added to the
index.

If you want to add support for minimum size, I think you can make the
changes in ShingleFilter.fillShingleBuffer(). Create an issue in jira and
someone who knows more about shingles can help out.

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS=-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr 1.4: extending StatsComponent to recognize localparm {!ex}

2009-08-26 Thread Britske

Thanks for that. 
it works now ;-) 


Erik Hatcher-4 wrote:
 
 
 On Aug 25, 2009, at 6:35 PM, Britske wrote:
 Moreover, I can't seem to find the actual code in FacetComponent or  
 anywhere
 else for that matter where the {!ex}-param case is treated. I assume  
 it's in
 FacetComponent.refineFacets but I can't seem to get a grip on it..  
 Perhaps
 it's late here..

 So, somone care to shed a light on how this might be done? (I only  
 need some
 general directions I hope..)
 
 It's in SimpleFacets, that does a call to QueryParsing.getLocalParams().
 
   Erik
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4%3A-extending-StatsComponent-to-recognize-localparm-%7B%21ex%7D-tp25143428p25148403.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Create new core from existing

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
check this http://wiki.apache.org/solr/CoreAdmin

when you create a core you are allowed to use the same instance dir as
the old core just ensure that you give a different datadir

On Wed, Aug 26, 2009 at 3:05 PM, pavan kumar
donepudipavan.donep...@gmail.com wrote:
 Paul,
 Can you please guide me on which option i need to use to do this and if
 possible any sample or a wiki link.
 Thanks  Regard's,
 Pavan

 2009/8/26 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 The coreadmin would not copy your data. However, it is possible to
 create another core using the same config and schema

 On Wed, Aug 26, 2009 at 1:51 PM, pavan kumar
 donepudipavan.donep...@gmail.com wrote:
  hi everyone   Is there any way to create a new solr core from the
  existing
  core using CoreAdminHandler,I want the instance directory to be created
  by
  copying the files from existing core and data directory path can be
  provided
  through dataDir querystring.
 
  Regard's,
  Pavan
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: solr nutch url indexing

2009-08-26 Thread Uri Boness

Do you mean the schema or the solrconfig.xml?

The request handler is configured in the solrconfig.xml and you can find 
out more about this particular configuration in 
http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(CategorySolrRequestHandler)|((CategorySolrRequestHandler)). 



To understand the schema better, you can read 
http://wiki.apache.org/solr/SchemaXml


Uri

last...@gmail.com wrote:

Uri Boness wrote:
Well... yes, it's a tool the Nutch ships with. It also ships with an 
example Solr schema which you can use. 


hi,
is there any documentation to understand what going in the schema ?

requestHandler name=/nutch class=solr.SearchHandler 
   lst name=defaults
   str name=defTypedismax/str
   str name=echoParamsexplicit/str
   float name=tie0.01/float
   str name=qfcontent0.5 anchor1.0 title5.2/str
   str name=pfcontent0.5 anchor1.5 title5.2 site1.5/str
   str name=flurl/str
   str name=mm2lt;-1 5lt;-2 6lt;90%/str
   int name=ps100/int
   bool hl=true/
   str name=q.alt*:*/str
   str name=hl.fltitle url content/str
   str name=f.title.hl.fragsize0/str
   str name=f.title.hl.alternateFieldtitle/str
   str name=f.url.hl.fragsize0/str
   str name=f.url.hl.alternateFieldurl/str
   str name=f.content.hl.fragmenterregex/str
   /lst
/requestHandler



HTML decoder is splitting tokens

2009-08-26 Thread Anders Melchiorsen
Hi.

When indexing the string Guuml;nther with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
Gü and nther.

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.




Reason to change the xml files in solr

2009-08-26 Thread Tamilselvi

For the installation of apache solr integration module in Drupal we need to
install solr. 

The must do thing is we need to change the solr schema.xml and configure.xml
files with the files in apache solr integration module. 

can any body explain the reason behind this change. 
-- 
View this message in context: 
http://www.nabble.com/Reason-to-change-the-xml-files-in-solr-tp25151354p25151354.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: encoding problem

2009-08-26 Thread Fuad Efendi
If you are complaining about Web Application (other than SOLR) (probably
behind-the Apache HTTPD) having encoding problem - try to troubleshoot it
with Mozilla Firefox + Live Http Headers plugin.


Look at Content-Encoding HTTP response headers, and don't forget about
meta http-equiv...  tag inside HTML... 


-Fuad
http://www.tokenizer.org



-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: August-26-09 12:55 AM
To: 'solr-user@lucene.apache.org'
Subject: encoding problem 

We have an encoding problem with our solr application. That is, non-ASCII
chars displaying fine in SOLR, but in googledegook in our application .

Our tomcat server.xml file already contains URIencoding=UTF-8 under the
relevant connector.

A google search reveals that I should set the encoding for the JVM, but have
no idea how to do this. I'm running Windows, and there is no tomcat process
in my Windows Services.

TIA

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email:
bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B
(Vic)

Important Notice: The contents of this email are intended solely for the
named addressee and are confidential; any unauthorised use, reproduction or
storage of the contents is expressly prohibited. If you have received this
email in error, please delete it and any attachments immediately and advise
the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are
error or virus free





What makes a function query count as a match or not?

2009-08-26 Thread Christophe Biocca
I haven't been able to find what makes a function query count as a match
when used a part of a boolean query with Occur.MUST.
A Term query is simple, if the term is not found, it doesn't count as a
match. What's the equivalent for a function query? A score of zero (or less
than zero, as implied by the source code for explain in lucene's boolean
query?). Something else?


Re: HTML decoder is splitting tokens

2009-08-26 Thread Koji Sekiguchi

Hi Anders,

Sorry, I don't know this is a bug or a feature, but
I'd like to show an alternate way if you'd like.

In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
marked as deprecated. Instead, HTMLStripCharFilterFactory and
an arbitrary TokenizerFactory are encouraged to use.
And I'd recommend you to use MappingCharFilterFactory
to convert character references to real characters.
That is, you have:

fieldType name=textHtml class=solr.TextField 
 analyzer
   charFilter class=solr.MappingCharFilterFactory 
mapping=mapping.txt/

   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.WhitespaceTokenizerFactory/
 /analyzer
/fieldType

where the contents of mapping.txt:

uuml; = ü
auml; = ä
iuml; = ï
euml; = ë
ouml; = ö
   : :

Then run analysis.jsp and see the result.

Thank you,

Koji


Anders Melchiorsen wrote:

Hi.

When indexing the string Guuml;nther with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
Gü and nther.

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.



  




Solr admin url for example gives 404

2009-08-26 Thread Burton-West, Tom
Hello all,

When I start up Solr from the example directory using start.jar, it seems to 
start up, but when I go to the localhost admin url 
(http://localhost:8983/solr/admin) I get a 404 (See message appended below).  
Has the url for the Solr admin changed?


Tom
Tom Burton-West
---
Here is the message I get with the 404:


HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by 
jetty://http://jetty.mortbay.org
Steps to reproduce the problems:

1 get the latest Solr from svn (R 808058)
2 run ant clean test   (all tests pass)
3 cd ./example
4. start solr
$ java -jar start.jar
2009-08-26 12:08:08.300::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2009-08-26 12:08:08.472::INFO:  jetty-6.1.3
2009-08-26 12:08:08.519::INFO:  Started SocketConnector @ 0.0.0.0:8983
5. go to browser and try to look at admin panel: 
http://localhost:8983/solr/admin



Re: Solr admin url for example gives 404

2009-08-26 Thread Rafał Kuć
Hello!

   Try running ant example and then run Solr.

-- 
Regards,
 Rafał Kuć


 Hello all,

 When I start up Solr from the example directory using start.jar, it
 seems to start up, but when I go to the localhost admin url
 (http://localhost:8983/solr/admin) I get a 404 (See message appended
 below).  Has the url for the Solr admin changed?


 Tom
 Tom Burton-West
 ---
 Here is the message I get with the 404:


 HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by
 jetty://http://jetty.mortbay.org
 Steps to reproduce the problems:

 1 get the latest Solr from svn (R 808058)
 2 run ant clean test   (all tests pass)
 3 cd ./example
 4. start solr
 $ java -jar start.jar
 2009-08-26 12:08:08.300::INFO:  Logging to STDERR via 
 org.mortbay.log.StdErrLog
 2009-08-26 12:08:08.472::INFO:  jetty-6.1.3
 2009-08-26 12:08:08.519::INFO:  Started SocketConnector @ 0.0.0.0:8983
 5. go to browser and try to look at admin panel: 
 http://localhost:8983/solr/admin







JDWP Error

2009-08-26 Thread Licinio Fernández Maurelo
The servlet container (resin) where i deploy solr shows :

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)


then, when we want to stop resin it doesn't works, any advice?

thx

-- 
Lici


SolrJ and Solr web simultaneously?

2009-08-26 Thread Paul Tomblin
Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


-- 
http://www.linkedin.com/in/paultomblin


Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Smiley, David W.
Once a commit occurs, all data added before it (by any  all clients) becomes 
visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin



RE: JDWP Error

2009-08-26 Thread Fuad Efendi

JDPA/JDWP are for remote debugging of SUN JVM...
It shouldn't be SOLR related... check configs of Resin...
-Fuad
http://www.tokenizer.org



-Original Message-
From: Licinio Fernández Maurelo [mailto:licinio.fernan...@gmail.com] 
Sent: August-26-09 12:49 PM
To: solr-user@lucene.apache.org
Subject: JDWP Error

The servlet container (resin) where i deploy solr shows :

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)


then, when we want to stop resin it doesn't works, any advice?

thx

-- 
Lici




Pattern matching in Solr

2009-08-26 Thread bhaskar chandrasekar
Hi,
 
Can any one help me with the below scenario?.
 
Scenario 1:
 
Assume that I give Google as input string 
i am using Carrot with Solr 
Carrot is for front end display purpose 
the issue is 
Assuming i give BHASKAR as input string 
It should give me search results pertaining to BHASKAR only.
 Select * from MASTER where name =Bhaskar;
 Example:It should not display search results as ChandarBhaskar or
 BhaskarC.
 Should display Bhaskar only.
 
Scenario 2:
 Select * from MASTER where name like %BHASKAR%;
 It should display records containing the word BHASKAR
 Ex: Bhaskar
ChandarBhaskar
 BhaskarC
 Bhaskarabc

 How to achieve Scenario 1 in Solr ?.


 
Regards
Bhaskar



  

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer 
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The Master 
only taking the new index from Database and slaves will pull the new index 
using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master 
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any  all clients) becomes 
visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin



Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Smiley, David W.
See my response to Paul Tomblin.  You could use the existing DataImportHandler 
SqlEntityProcessor for DB access.  The DIH framework is fairly extensible.

BTW, I wouldn't immediately dismiss using HTTP to give data to Solr just 
because you believe it will be slow without having tried it.  Using SolrJ with 
StreamingUpdateSolrServer configured with multiple threads and using the 
default binary format is pretty darned fast.  Don't knock it till you've tried 
it.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server


On 8/26/09 1:41 PM, Francis Yakin fya...@liquid.com wrote:

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer 
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The Master 
only taking the new index from Database and slaves will pull the new index 
using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master 
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any  all clients) becomes 
visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin




RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any  all clients)
becomes visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin





Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Avlesh Singh

 Is Solr like a RDBMS in that I can have multiple programs querying and
 updating the index at once, and everybody else will see the updates after a
 commit, or do I have to something explicit to see others updates?

Yes, everyone gets to search on an existing index unless writes to the index
(core) are committed. None of the searches would fetch uncommitted data.

Does it matter whether they're using the web interface, SolrJ with a
 CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?

Absolutely not. All of these are multiple ways to access the Solr server;
underlying implementation of searching the index and writing to the index
does not change in either case.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:44 PM, Paul Tomblin ptomb...@xcski.com wrote:

 Is Solr like a RDBMS in that I can have multiple programs querying and
 updating the index at once, and everybody else will see the updates
 after a commit, or do I have to something explicit to see others
 updates?  Does it matter whether they're using the web interface,
 SolrJ with a
 CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


 --
 http://www.linkedin.com/in/paultomblin



Problem using replication in 8/25/09 nightly build of 1.4

2009-08-26 Thread Ron Ellis
Hi Everyone,

When trying to utilize the new HTTP based replication built into Solr 1.4 I
encounter a problem. When I view the replication admin page on the slave all
of the master values are null i.e. Replicatable Index Version:null,
Generation: null | Latest Index Version:null, Generation: null. Despite
these missing values the two seem to be talking over HTTP successfully (if I
shutdown the master the slave replication page starts exploding with a NPE).

When I hit http://solr/replication?command=indexversionwt=xml I get the
following...

response
-
lst name=responseHeader
int name=status0/int
int name=QTime13/int
/lst
long name=indexversion0/long
long name=generation0/long
/response

However in the admin/replication UI on the master I see...

**
 Index Version: 1250525534711, Generation: 1778
Any idea what I'm doing wrong or how I could begin to diagnose? I am using
the 8/25 nightly build of solr with the example solrconfig.xml provided. The
only modifications to the config have been to uncomment the master/rslave
replication sections and remove the data directory location line so it falls
back to solr.home/data. Also if it's relevant this index was originally
created in solr 1.3.

Thanks,
Ron Ellis


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves... 

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application. 
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB? 

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any  all clients)
becomes visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin







${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread djain101

I have one quick question...

If in solrconfig.xml, if it says ...

abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError

does it mean abortOnConfigurationError defaults to false if it is not set
as system property?

Thanks,
Dharmveer
-- 
View this message in context: 
http://www.nabble.com/%24%7Bsolr.abortOnConfigurationError%3Afalse%7D---does-it-defaults-to-false-tp25155213p25155213.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread Ryan McKinley


On Aug 26, 2009, at 3:33 PM, djain101 wrote:



I have one quick question...

If in solrconfig.xml, if it says ...

abortOnConfigurationError${solr.abortOnConfigurationError:false}/ 
abortOnConfigurationError


does it mean abortOnConfigurationError defaults to false if it is  
not set

as system property?



correct



Searching and Displaying Different Logical Entities

2009-08-26 Thread wojtekpia

I'm trying to figure out if Solr is the right solution for a problem I'm
facing. I have 2 data entities: P(arent)  C(hild). P contains up to 100
instances of C. I need to expose an interface that searches attributes of
entity C, but displays them grouped by parent entity, P. I need to include
facet counts in the result, and the counts are based on P.

My first solution was to create 2 Solr instances: one for each entity. I
would have to execute 2 queries each time: 1) get a list of matching P's
based on a query of the C instance (facet by P ID in C instance to get
unique list of P's), then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints. 

My second (and current) solution is to create a single instance, and flatten
all C attributes into the appropriate P record using dynamic fields. For
example, if C has an attribute CA, then I have a dynamic field in P called
CA*. I name this field incrementally based on the number of C's per P (CA1,
CA2, ...).  This works, except that each query is very long (CA1:condition
OR CA2: condition ...). 

Neither solution is ideal. I'm wondering if I'm missing something obvious,
or if I'm using the wrong solution for this problem.

Any insight is appreciated.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25156301.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Searching and Displaying Different Logical Entities

2009-08-26 Thread Fuad Efendi
then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints.


SOLR can automatically provide you P's with Counts, and it will be
_unique_...

Even if cardinality of P is 10,000+ SOLR is very fast now (expect few
seconds response time for initial request). You need single query with
faceting...


(!) You do not need P's ID.

Single document will have unique ID, and fields such as P, C (with possible
attributes). Do not think in terms of RDBMS... Lucene does all
'normalization' behind the scenes, and SOLR will give you Ps with Cs... 



-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: August-26-09 3:58 PM
To: solr-user@lucene.apache.org
Subject: Searching and Displaying Different Logical Entities


I'm trying to figure out if Solr is the right solution for a problem I'm
facing. I have 2 data entities: P(arent)  C(hild). P contains up to 100
instances of C. I need to expose an interface that searches attributes of
entity C, but displays them grouped by parent entity, P. I need to include
facet counts in the result, and the counts are based on P.

My first solution was to create 2 Solr instances: one for each entity. I
would have to execute 2 queries each time: 1) get a list of matching P's
based on a query of the C instance (facet by P ID in C instance to get
unique list of P's), then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints. 

My second (and current) solution is to create a single instance, and flatten
all C attributes into the appropriate P record using dynamic fields. For
example, if C has an attribute CA, then I have a dynamic field in P called
CA*. I name this field incrementally based on the number of C's per P (CA1,
CA2, ...).  This works, except that each query is very long (CA1:condition
OR CA2: condition ...). 

Neither solution is ideal. I'm wondering if I'm missing something obvious,
or if I'm using the wrong solution for this problem.

Any insight is appreciated.

Wojtek
-- 
View this message in context:
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp
25156301p25156301.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
 We already opened port 80 from solr to DB so that's not the issue, but 
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master 
only accept the new index from DB and slaves will pull the new indexes from 
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and 
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, 
since we want the Solr Master acting as a solr server as well.
I just worried that http will be a bottle neck, that's why I prefer JDBC 
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any  all clients)
becomes visible to all searches henceforth.

The web interface has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: 

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take  0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies... 




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web 

Re: What makes a function query count as a match or not?

2009-08-26 Thread Yonik Seeley
On Wed, Aug 26, 2009 at 11:27 AM, Christophe
Bioccachristo...@openplaces.org wrote:
 I haven't been able to find what makes a function query count as a match
 when used a part of a boolean query with Occur.MUST.

A function query matches all non-deleted documents.

-Yonik
http://www.lucidimagination.com


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

- JDBC is a library for Java Application; it connects to Database; it uses
proprietary protocol provided by DB vendor in most cases, and specific port
number
- SolrJ is a library for Java Application; it connects to SOLR; it uses HTTP
protocol




RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
No, we don't want to put at the same box as Database box.

Agree, that indexing/committing/merging and optimizing is the bottle neck.

I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and 
let's see what happened to load 3 millions docs.

Thanks

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take  0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR 

RE: Solr Replication

2009-08-26 Thread J G

Thanks for the response.

It's interesting because when I run jconsole all I can see is one 
ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice 
it finds on its path. Is there anyway to have multiple replication handlers or 
at least obtain replication on a per slice/instance via JMX like how you 
can see attributes for each slice/instance via each replication admin jsp 
page? 

Thanks again.

 From: noble.p...@corp.aol.com
 Date: Wed, 26 Aug 2009 11:05:34 +0530
 Subject: Re: Solr Replication
 To: solr-user@lucene.apache.org
 
 The ReplicationHandler is not enforced as a singleton , but for all
 practical purposes it is a singleton for one core.
 
 If an instance  (a slice as you say) is setup as a repeater, It can
 act as both a master and slave
 
 in the repeater the configuration should be as follows
 
 MASTER
   |_SLAVE (I am a slave of MASTER)
   |
 REPEATER (I am a slave of MASTER and master to my slaves )
  |
  |
 REPEATER_SLAVE( of REPEATER)
 
 
 the point is that REPEATER will have a slave section has a masterUrl
 which points to master and REPEATER_SLAVE will have a slave section
 which has a masterurl pointing to repeater
 
 
 
 
 
 
 On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote:
 
  Hello,
 
  We are running multiple slices in our environment. I have enabled JMX and I 
  am inspecting the replication handler mbean to obtain some information 
  about the master/slave configuration for replication. Is the replication 
  handler mbean a singleton? I only see one mbean for the entire server and 
  it's picking an arbitrary slice to report on. So I'm curious if every slice 
  gets its own replication handler mbean? This is important because I have no 
  way of knowing in this specific server any information about the other 
  slices, in particular, information about the master/slave value for the 
  other slices.
 
  Reading through the Solr 1.4 replication strategy, I saw that a slice can 
  be configured to be a master and a slave, i.e. a repeater. I'm wondering 
  how repeaters work because let's say I have a slice named 'A' and the 
  master is on server 1 and the slave is on server 2 then how are these two 
  servers communicating to replicate? Looking at the jmx information I have 
  in the MBean both the isSlave and isMaster is set to true for my repeater 
  so how does this solr slice know if it's the master or slave? I'm a bit 
  confused.
 
  Thanks.
 
 
 
 
  _
  With Windows Live, you can organize, edit, and share your photos.
  http://www.windowslive.com/Desktop/PhotoGallery
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

SortableFloatFieldSource not accessible? (1.3)

2009-08-26 Thread Christophe Biocca
The class SortableFloatFieldSource cannot be accessed from outside its
package. So it can't be used as part of a FunctionQuery.
Is there a workaround to this, or should I roll my own? Will it be fixed in
1.4?


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin

Thanks for the response.

I will try CommonsHttpSolrServer for now.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take  0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

 I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

network latency does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice 

Re: Using Lucene's payload in Solr

2009-08-26 Thread Bill Au
While testing my code I discovered that my copyField with PatternTokenize
does not do what I want.  This is what I am indexing into Solr:

field name=title2.0|Solr In Action/field

My copyField is simply:

   copyField source=title dest=titleRaw/

field titleRaw is of type title_raw:

fieldType name=title_raw class=solr.TextField
  analyzer type=index
tokenizer class=solr.PatternTokenizerFactory pattern=[^#]*#(.*)
group=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
  /analyzer
/fieldType

For my example input Solr in Action is indexed into the titleRaw field
without the payload.  But the payload is still stored.  So when I retrieve
the field titleRaw I still get back 2.0|Solr in Action where what I really
want is just Solr in Action.

Is it possible to have the copyField strip off the payload while it is
copying since doing it in the analysis phrase is too late?  Or should I
start looking into using UpdateProcessors as Chris had suggested?

Bill

On Fri, Aug 21, 2009 at 12:04 PM, Bill Au bill.w...@gmail.com wrote:

 I ended up not using an XML attribute for the payload since I need to
 return the payload in query response.  So I ended up going with:

 field name=title2.0|Solr In Action/field

 My payload is numeric so I can pick a non-numeric delimiter (ie '|').
 Putting the payload in front means I don't have to worry about the delimiter
 appearing in the value.  The payload is required in my case so I can simply
 look for the first occurrence of the delimiter and ignore the possibility of
 the delimiter appearing in the value.

 I ended up writing a custom Tokenizer and a copy field with a
 PatternTokenizerFactory to filter out the delimiter and payload.  That's is
 straight forward in terms of implementation.  On top of that I can still use
 the CSV loader, which I really like because of its speed.

 Bill.

 On Thu, Aug 20, 2009 at 10:36 PM, Chris Hostetter 
 hossman_luc...@fucit.org wrote:


 : of the field are correct but the delimiter and payload are stored so
 they
 : appear in the response also.  Here is an example:
 ...
 : I am thinking maybe I can do this instead when indexing:
 :
 : XML for indexing:
 : field name=title payload=2.0Solr In Action/field
 :
 : This will simplify indexing as I don't have to repeat the payload for
 each

 but now you're into a custom request handler for the updates to deal with
 the custom XML attribute so you can't use DIH, or CSV loading.

 It seems like it might be simpler have two new (generic) UpdateProcessors:
 one that can clone fieldA into fieldB, and one that can do regex mutations
 on fieldB ... neither needs to know about payloads at all, but the first
 can made a copy of 2.0|Solr In Action and the second can strip off the
 2.0| from the copy.

 then you can write a new NumericPayloadRegexTokenizer that takes in two
 regex expressions -- one that knows how to extract the payload from a
 piece of input, and one that specifies the tokenization.

 those three classes seem easier to implemnt, easier to maintain, and more
 generally reusable then a custom xml request handler for your updates.


 -Hoss





Sorting by Unindexed Fields

2009-08-26 Thread Isaac Foster
Hi,

I have a situation where a particular kind of document can be categorized in
different ways, and depending on the categories it is in it will have
different fields that describe it (in practice the number of fields will be
fairly small, but whatever). These documents will each have a full-text
field that Solr is perfect for, and it seems like Solr's dynamic fields
ability makes it an even more perfect solution.

I'd like to be able to sort by any of the fields, but indexing them all
seems somewhere between unwise and impossible. Will Solr sort by fields that
are unindexed?

iSac


Re: SortableFloatFieldSource not accessible? (1.3)

2009-08-26 Thread Yonik Seeley
SortableFloatField works in function queries... it's just that
everyone goes through SortableFloatField.getValueSource() to create
them.  Will that work for you?

-Yonik
http://www.lucidimagination.com


On Wed, Aug 26, 2009 at 6:23 PM, Christophe
Bioccachristo...@openplaces.org wrote:
 The class SortableFloatFieldSource cannot be accessed from outside its
 package. So it can't be used as part of a FunctionQuery.
 Is there a workaround to this, or should I roll my own? Will it be fixed in
 1.4?



Re: Sorting by Unindexed Fields

2009-08-26 Thread Avlesh Singh

 Will Solr sort by fields that are unindexed?

Unfortunately, No.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.comwrote:

 Hi,

 I have a situation where a particular kind of document can be categorized
 in
 different ways, and depending on the categories it is in it will have
 different fields that describe it (in practice the number of fields will be
 fairly small, but whatever). These documents will each have a full-text
 field that Solr is perfect for, and it seems like Solr's dynamic fields
 ability makes it an even more perfect solution.

 I'd like to be able to sort by any of the fields, but indexing them all
 seems somewhere between unwise and impossible. Will Solr sort by fields
 that
 are unindexed?

 iSac



Re: Sorting by Unindexed Fields

2009-08-26 Thread Isaac Foster
Is it also the case that it will not narrow by them?

Isaac

On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh avl...@gmail.com wrote:

 
  Will Solr sort by fields that are unindexed?
 
 Unfortunately, No.

 Cheers
 Avlesh

 On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.com
 wrote:

  Hi,
 
  I have a situation where a particular kind of document can be categorized
  in
  different ways, and depending on the categories it is in it will have
  different fields that describe it (in practice the number of fields will
 be
  fairly small, but whatever). These documents will each have a full-text
  field that Solr is perfect for, and it seems like Solr's dynamic fields
  ability makes it an even more perfect solution.
 
  I'd like to be able to sort by any of the fields, but indexing them all
  seems somewhere between unwise and impossible. Will Solr sort by fields
  that
  are unindexed?
 
  iSac
 



Re: Sorting by Unindexed Fields

2009-08-26 Thread Avlesh Singh

 Is it also the case that it will not narrow by them?

If narrowing means faceting, then again a no.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 6:36 AM, Isaac Foster isaac.z.fos...@gmail.comwrote:

 Is it also the case that it will not narrow by them?

 Isaac

 On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh avl...@gmail.com wrote:

  
   Will Solr sort by fields that are unindexed?
  
  Unfortunately, No.
 
  Cheers
  Avlesh
 
  On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.com
  wrote:
 
   Hi,
  
   I have a situation where a particular kind of document can be
 categorized
   in
   different ways, and depending on the categories it is in it will have
   different fields that describe it (in practice the number of fields
 will
  be
   fairly small, but whatever). These documents will each have a full-text
   field that Solr is perfect for, and it seems like Solr's dynamic fields
   ability makes it an even more perfect solution.
  
   I'd like to be able to sort by any of the fields, but indexing them all
   seems somewhere between unwise and impossible. Will Solr sort by fields
   that
   are unindexed?
  
   iSac
  
 



Fwd: Lucene Search Performance Analysis Workshop

2009-08-26 Thread Erik Hatcher
While Andrzej's talk will focus on things at the Lucene layer, I'm  
sure there'll be some great tips and tricks useful to Solrians too.   
Andrzej is one of the sharpest folks I've met, and he's also a very  
impressive presenter.  Tune in if you can.


Erik


Begin forwarded message:


From: Andrzej Bialecki a...@getopt.org
Date: August 26, 2009 5:44:40 PM EDT
To: java-u...@lucene.apache.org
Subject: Lucene Search Performance Analysis Workshop
Reply-To: java-u...@lucene.apache.org

Hi all,

I am giving a free talk/ workshop next week on how to analyze and  
improve Lucene search performance for native lucene apps. If you've  
ever been challenged to get your Java Lucene search apps running  
faster, I think you might find the talk of interest.


Free online workshop:
Thursday, September 3rd 2009
11:00-11:30AM PDT / 14:00-14:30 EDT

Follow this link to sign up:
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP

About:
Lucene Performance Workshop:
Understanding Lucene Search Performance
with Andrzej Bialecki

Experienced Java developers know how to use the Apache Lucene  
library to build powerful search applications natively in Java.
LucidGaze for Lucene from Lucid Imagination, just released this  
week, provides a powerful utility for making transparent the  
underlying indexing and search operations, and analyzing their  
impact on search performance.


Agenda:
* Understanding sources of variability in Lucene search performance
* LucidGaze for Lucene APIs for performance statistics
* Applying LucidGaze for Lucene performance statistics to real-world  
performance problems


Join us for a free online workshop. Sign up via the link below:
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP

About the Presenter:
Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid  
Imagination Technical Advisory Board; he also serves as the project  
lead for Nutch, and as committer in the Lucene-java, Nutch and  
Hadoop projects. He has broad expertise, across domains as diverse  
as information retrieval, systems architecture, embedded systems  
kernels, networking and business process/e-commerce modeling. He's  
also the author of the popular Luke index inspection utility.  
Andrzej holds a master's degree in Electronics from Warsaw Technical  
University, speaks four languages and programs in many, many more.



--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





Total count of records

2009-08-26 Thread bhaskar chandrasekar
Hi,
 
When Solr retrives records based on a input match , it gives total count of 
records.
Say for Ex , it displays like : 1 out of 20,000 for the particular search 
string.
 
How the total count of records are fetched in Solr , does it refer any Schema 
or XML file?.
 
 
Regards
Bhaskar
 


  

Re: Total count of records

2009-08-26 Thread Avlesh Singh

 How the total count of records are fetched in Solr , does it refer any
 Schema or XML file?.

Sorry, but I did not get you. What does that mean? The total count is not
stored anywhere; it is computed based on how many documents you have in your
index matching the query.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 7:36 AM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:

 Hi,

 When Solr retrives records based on a input match , it gives total count of
 records.
 Say for Ex , it displays like : 1 out of 20,000 for the particular search
 string.

 How the total count of records are fetched in Solr , does it refer any
 Schema or XML file?.


 Regards
 Bhaskar






RE: Lucene Search Performance Analysis Workshop

2009-08-26 Thread Fuad Efendi
I am wondering... are new SOLR filtering features faster than standard
Lucene queries like
{query} AND {filter}???

Why can't we improve Lucene then?

Fuad


P.S. 
https://issues.apache.org/jira/browse/SOLR-1169
https://issues.apache.org/jira/browse/SOLR-1179





-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: August-26-09 8:50 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Lucene Search Performance Analysis Workshop

While Andrzej's talk will focus on things at the Lucene layer, I'm  
sure there'll be some great tips and tricks useful to Solrians too.   
Andrzej is one of the sharpest folks I've met, and he's also a very  
impressive presenter.  Tune in if you can.

Erik


Begin forwarded message:

 From: Andrzej Bialecki a...@getopt.org
 Date: August 26, 2009 5:44:40 PM EDT
 To: java-u...@lucene.apache.org
 Subject: Lucene Search Performance Analysis Workshop
 Reply-To: java-u...@lucene.apache.org

 Hi all,

 I am giving a free talk/ workshop next week on how to analyze and  
 improve Lucene search performance for native lucene apps. If you've  
 ever been challenged to get your Java Lucene search apps running  
 faster, I think you might find the talk of interest.

 Free online workshop:
 Thursday, September 3rd 2009
 11:00-11:30AM PDT / 14:00-14:30 EDT

 Follow this link to sign up:

http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP

 About:
 Lucene Performance Workshop:
 Understanding Lucene Search Performance
 with Andrzej Bialecki

 Experienced Java developers know how to use the Apache Lucene  
 library to build powerful search applications natively in Java.
 LucidGaze for Lucene from Lucid Imagination, just released this  
 week, provides a powerful utility for making transparent the  
 underlying indexing and search operations, and analyzing their  
 impact on search performance.

 Agenda:
 * Understanding sources of variability in Lucene search performance
 * LucidGaze for Lucene APIs for performance statistics
 * Applying LucidGaze for Lucene performance statistics to real-world  
 performance problems

 Join us for a free online workshop. Sign up via the link below:

http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP

 About the Presenter:
 Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid  
 Imagination Technical Advisory Board; he also serves as the project  
 lead for Nutch, and as committer in the Lucene-java, Nutch and  
 Hadoop projects. He has broad expertise, across domains as diverse  
 as information retrieval, systems architecture, embedded systems  
 kernels, networking and business process/e-commerce modeling. He's  
 also the author of the popular Luke index inspection utility.  
 Andrzej holds a master's degree in Electronics from Warsaw Technical  
 University, speaks four languages and programs in many, many more.


 -- 
 Best regards,
 Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org






RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
Frankly, I never tried any DIH... probably it is the best option for this
specific case (they have Java developer) - but one should be knowledgeable
enough to design SOLR schema... And I noticed here (and also at HBase
mailing list) many first-time users are still thinking in terms of
Relational-DBMS and are trying to index as-is their tables with relations
(and different PKs) instead of indexing their documents... I have constantly
1000+ docs per second now, with 5%-15% CPU... small docs 5Kb in size in
average, 7 fields... yes, correct, 3M+ docs in an hour... could be 10 times
more!!! (5%-15%CPU currently)
Fuad

With a relational database, the approach that has been working for us  
and many customers is to first give DataImportHandler a go.  It's  
powerful and fast.  3M docs should index in about an hour or less, I'd  
speculate.  But using DIH does require making access from Solr to the  
DB server solid, of course.

   Erik





Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-26 Thread Aaron Aberg
Hey Guys,

Ok, I found this:

Troubleshooting Errors
It's possible that you get an error related to the following:

SEVERE: Exception starting filter SolrRequestFilter
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.core.SolrConfig
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:76)
.
Caused by: java.lang.RuntimeException: XPathFactory#newInstance()
failed to create an XPathFactory for the default object model:
http://java.sun.com/jaxp/xpath/dom with the
XPathFactoryConfigurationException: javax.xml.x
path.XPathFactoryConfigurationException: No XPathFctory implementation
found for the object model: http://java.sun.com/jaxp/xpath/dom
at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)

This is due to your tomcat instance not having the xalan jar file in
the classpath. It took me some digging to find this, and thought it
might be useful for others. The location varies from distribution to
distribution, but I essentially just added (via a symlink) the jar
file to the shared/lib directory under the tomcat directory.

I am a java n00b. How can I set this up?

On Tue, Aug 18, 2009 at 10:16 PM, Chris
Hostetterhossman_luc...@fucit.org wrote:

 : -Dsolr.solr.home='/some/path'
 :
 : Should I be putting that somewhere? Or is that already taken care of
 : when I edited the web.xml file in my solr.war file?

 No ... you do not need to set that system property if you already have it
 working because of modifications to the web.xml ... according to the log
 you posted earlier, Solr is seeing your solr home dir set correctly...

 Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader 
 locateInstanceDir
 INFO: Using JNDI solr.home: /usr/share/solr
 Aug 17, 2009 11:16:15 PM org.apache.solr.core.CoreContainer$Initializer 
 initialize
 INFO: looking for solr.xml: /usr/share/solr/solr.xml
 Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to '/usr/share/solr/'

 ...that's were you want it to point, correct?

 (don't be confused by the later message of Check solr/home property ...
 that's just a hint because 9 times out of 10 an error initializing solr
 comes from solr needing to *guess* about the solr home dir)

 The crux of your error is being able to load an XPathFactory, the fact
 that it can't load an XPath factory prevents the your
 classloader from even being able to load the SolrConfig class -- note this
 also in the log you posted earlier...

 java.lang.NoClassDefFoundError: Could not initialize class 
 org.apache.solr.core.SolrConfig

 ...the root of the problem is here...

 Caused by: java.lang.RuntimeException: XPathFactory#newInstance()
 failed to create an XPathFactory for the default object model:
 http://java.sun.com/jaxp/xpath/dom with the
 XPathFactoryConfigurationException:
 javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory
 implementation found for the object model:
 http://java.sun.com/jaxp/xpath/dom
        at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)
        at org.apache.solr.core.Config.clinit(Config.java:41)

 XPathFactory.newInstance() is used to construct an instance of an
 XPathFactory where the concrete type is unknown by the caller (in this
 case: solr)  There is an alternte form (XPathFactory.newInstance(String
 uri)) which allows callers to specify *which* model they want, and it can
 throw an exception if the model isn't available in the current JVM using
 reflection, but if you read the javadocs for hte method being called...

 http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPathFactory.html#newInstance()
   Get a new XPathFactory instance using the default object model,
   DEFAULT_OBJECT_MODEL_URI, the W3C DOM.

   This method is functionally equivalent to:

      newInstance(DEFAULT_OBJECT_MODEL_URI)

   Since the implementation for the W3C DOM is always available, this
   method will never fail.

 ...except that in your case, it is in fact clearly failing.  which
 suggests that your hosting provider has given you a crapy JVM.  I have no
 good suggestions for debugging this, other then this google link...

 http://www.google.com/search?q=+No+XPathFctory+implementation+found+for+the+object+model%3A+http%3A%2F%2Fjava.sun.com%2Fjaxp%2Fxpath%2Fdom

 The good new is, there isn't anything solr specific about this problem.
 Any servlet container giving you that error when you load solr, should
 cause the exact same error with a servlet as simple as this...

  public class TestServlet extends javax.servlet.http.HttpServlet {
    public static Object X = javax.xml.xpath.XPathFactory.newInstance();
    public void doGet (javax.servlet.http.HttpServletRequest req,
                       javax.servlet.http.HttpServletResponse res) {
       // NOOP
    }
  }

 ...which should provide you with a nice short bug report for your hosting
 provider.

 One last important note (because it may burn you once you get the XPath
 problem 

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-26 Thread Fuad Efendi
Looks like you totally ignored my previous post... 




 Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this
JDK
 runs on?
... such installs for Java are totally mess, you
may have incompatible Servlet API loaded by bootstrap classloader before
Tomcat classes




First of all, please, try to install standard Java from SUN  on your
development box, and run some samples...




 
!This is due to your tomcat instance not having the xalan jar file in
!the classpath


P.S.
Don't rely on CentOS 'approved' Java libraries.





Re: master/slave replication issue

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
The log messages are shown when you hit the admin page. So on't worry
about that. Keep a minimal configuration of Replication. All you need
is  masterUrl and pollInterval.


On Thu, Aug 27, 2009 at 5:52 AM, J Gskinny_joe...@hotmail.com wrote:







 Hello,

 I'm having an issue getting the master to replicate its index to the slave. 
 Below you will find my configuration settings. Here is what is happening: I 
 can access the replication dashboard for both the slave and master and I can 
 successfully execute HTTP commands against both of these urls through my 
 browser. Now, my slave is configured to use the same URL as the one I am 
 using in my browser when I query the master, yet when I do a tail -f tomcat 
 home/logs/catalina.out on the slave server all I see is :


 Master - server1.xyz.com Aug 27, 2009 12:13:29 AM 
 org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

 Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

 Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

 Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

 Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

 Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

 Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute

 INFO: [] webapp=null path=null params={command=details} status=0 QTime=


 For some reason, the webapp and the path is being set to null and I think 
 this is affecting the replication?!? I am running Solr as the WAR file and 
 it's 1.4 from a few weeks ago.



 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
        !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. 
 It is possible to have multiple entries of this config string--
        str name=replicateAfteroptimize/str

        !--Create a backup after 'optimize'. Other values can be 'commit', 
 'startup'. It is possible to have multiple entries of this config string 
 .note that this is just for backup. Replication does not require this --
        str name=backupAfteroptimize/str

        !--If configuration files need to be replicated give the names here, 
 separated by comma --
        !--str name=confFilesschema.xml,stopwords.txt,elevate.xml/str--
    /lst
 /requestHandler
 Notice that I commented out the replication of the configuration files. I 
 didn't think this is important for the attempt to try to get replication 
 working. However, is it good to have these files replicated?


 Slave - server2.xyz.com

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=slave

        !--fully qualified url for the replication handler of master . It is 
 possible to pass on this as a request param for the fetchindex command--
        str 
 name=masterUrlhttp://server1.xyz.com:8080/jdoe/replication/str

        !--Interval in which the slave should poll master .Format is HH:mm:ss 
 . If this is absent slave does not poll automatically.
         But a fetchindex can be triggered from the admin or the http API --
        str name=pollInterval00:00:20/str
        !-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED--
        !--to use compression while transferring the index files. The 
 possible values are internal|external
         if the value is 'external' make sure that your master Solr has the 
 settings to honour the accept-encoding header.
         see here for details http://wiki.apache.org/solr/SolrHttpCompression
         If it is 'internal' everything will be taken care of automatically.
         USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN 
 REPLICATION IN A LAN--
        str name=compressioninternal/str
        !--The following values are used when the slave connects to the 
 master to download the index files.
         Default values implicitly set as 5000ms and 1ms respectively. The 
 user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an 
 extremely high latency--
        str name=httpConnTimeout5000/str
        str name=httpReadTimeout1/str

        !-- If HTTP Basic authentication is enabled on the master, then the 
 slave can be configured with the following --
        str name=httpBasicAuthUserusername/str
        str name=httpBasicAuthPasswordpassword/str

     /lst
 /requestHandler



 Thanks for your help!




 _
 Hotmail® is up to 70% faster. Now good news travels really fast.
 

Re: Max limit on number of cores?

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is no hard limit. It is going to be decided by your h/w . You
will be limited by the no:of files that can be kept open by your
system.

On Thu, Aug 27, 2009 at 1:06 AM, djain101dharmveer_j...@yahoo.com wrote:

 Hi,

 Is there any maximum limit on the number of cores one solr webapp can have
 without compromising on its performance? If yes, what is that limit?

 Thanks,
 Dharmveer
 --
 View this message in context: 
 http://www.nabble.com/Max-limit-on-number-of-cores--tp25155334p25155334.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com