Memory usage

2009-04-14 Thread Gargate, Siddharth
Hi all,
I am testing indexing with 2000 text documents of size 2 MB
each. These documents contain words created with random characters. I
observed that the tomcat memory usage goes on increasing slowly. I tried
by removing all the cache configuration, but still memory usage
increases. Once the memory reaches to max heap specified, commit looks
like blocked until the memory is freed. With larger documents, I see
some OOMEs
Below are few properties set in solrconfig.xml

mainIndex
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB128/ramBufferSizeMB
mergeFactor25/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength2147483647/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

   lockTypesingle/lockType
   unlockOnStartupfalse/unlockOnStartup
/mainIndex 
autoCommit 
  maxDocs1/maxDocs
  maxTime7000/maxTime 
/autoCommit
useColdSearcherfalse/useColdSearcher
maxWarmingSearchers10/maxWarmingSearchers

Where does the memory get used? And how to avoid it?

Thanks,
Siddharth



Re: indexing txt file

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
what is the cntent of your text file?
Solr does not directly index files
--Noble

On Tue, Apr 14, 2009 at 3:54 AM, Alex Vu alex.v...@gmail.com wrote:
 Hi all,

 Currently I wrote an xml file and schema.xml file.  What is the next step to
 index a txt file?  Where should I put my txt file I want to index?

 thank you,
 Alex V.




-- 
--Noble Paul


Re: DataImporter : Java heap space

2009-04-14 Thread Mani Kumar
Hi Shalin:
yes i tried with batchSize=-1 parameter as well

here the config i tried with

dataConfig

dataSource type=JdbcDataSource batchSize=-1 name=sp
driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development
user=root password=** /

document name=items

entity name=item dataSource=sp query=select * from items

field column=id name=id /

field column=title name=title /

/entity

/document

/dataConfig


I hope i have used batchSize parameter @ right place.


Thanks!

Mani Kumar

On Tue, Apr 14, 2009 at 11:24 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Apr 14, 2009 at 11:18 AM, Mani Kumar manikumarchau...@gmail.com
 wrote:

  Here is the stack trace:
 
  notice in stack trace *   at
  com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)*
 
  It looks like that its trying to read whole table into memory at a time.
 n
  thts y getting OOM.
 
 
 Mani, the data-config.xml you posted does not have the batchSize=-1
 attribute to your data source. Did you try that? This is a known bug in
 MySql jdbc driver.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Memory usage

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 11:30 AM, Gargate, Siddharth sgarg...@ptc.comwrote:

 Hi all,
I am testing indexing with 2000 text documents of size 2 MB
 each. These documents contain words created with random characters. I
 observed that the tomcat memory usage goes on increasing slowly. I tried
 by removing all the cache configuration, but still memory usage
 increases. Once the memory reaches to max heap specified, commit looks
 like blocked until the memory is freed. With larger documents, I see
 some OOMEs
Below are few properties set in solrconfig.xml

 mainIndex
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB128/ramBufferSizeMB
mergeFactor25/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength2147483647/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

   lockTypesingle/lockType
   unlockOnStartupfalse/unlockOnStartup
 /mainIndex
 autoCommit
  maxDocs1/maxDocs
  maxTime7000/maxTime
 /autoCommit
 useColdSearcherfalse/useColdSearcher
 maxWarmingSearchers10/maxWarmingSearchers

 Where does the memory get used? And how to avoid it?


What jvm parameters are you using?

Also see the following:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e105
http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImporter : Java heap space

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.comwrote:

 Hi Shalin:
 yes i tried with batchSize=-1 parameter as well

 here the config i tried with

 dataConfig

dataSource type=JdbcDataSource batchSize=-1 name=sp
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/mydb_development
 user=root password=** /


 I hope i have used batchSize parameter @ right place.


Yes that is correct. Did it still throw OOM from the same place?

I'd suggest you increase the heap and see what works for you. Also try
-server on the jvm.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImporter : Java heap space

2009-04-14 Thread Mani Kumar
Yes its throwing the same OOM error and from same place...
yes i will try increasing the size ... just curious : how this dataimport
works?

Does it loads the whole table into memory?

Is there any estimate about how much memory it needs to create index for 1GB
of data.

thx
mani

On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com
 wrote:

  Hi Shalin:
  yes i tried with batchSize=-1 parameter as well
 
  here the config i tried with
 
  dataConfig
 
 dataSource type=JdbcDataSource batchSize=-1 name=sp
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/mydb_development
  user=root password=** /
 
 
  I hope i have used batchSize parameter @ right place.
 
 
 Yes that is correct. Did it still throw OOM from the same place?

 I'd suggest you increase the heap and see what works for you. Also try
 -server on the jvm.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Can Solr have Multiple Separate Indexes?

2009-04-14 Thread Isaac Foster
Wow, that was pretty straight forward. Sorry I didn't catch that on the wiki
on my first few go rounds, I'll navigate harder next time.

Thanks.

Isaac

On Sun, Apr 12, 2009 at 11:40 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Apr 13, 2009 at 5:35 AM, Isaac Foster isaac.z.fos...@gmail.com
 wrote:

  Hi, I'm new using Solr but have used the Zend Framework implementation of
  Lucene before. One thing it supports is the ability to have separate
  indexes, so that you could keep your index of (example) forum posts and
  your
  index of user profiles separate, and query them separately. Can this be
  done
  with Solr? I've looked through the docs a good bit and will continue to,
  but
  if anyone can point me in the right direction I'd greatly appreciate it.
 

 Sure. There are a couple of ways. Take a look at
 http://wiki.apache.org/solr/MultipleIndexes


 --
 Regards,
 Shalin Shekhar Mangar.



Re: DataImporter : Java heap space

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH streams 1 row at a time.

DIH is just a component in Solr. Solr indexing also takes a lot of memory

On Tue, Apr 14, 2009 at 12:02 PM, Mani Kumar manikumarchau...@gmail.com wrote:
 Yes its throwing the same OOM error and from same place...
 yes i will try increasing the size ... just curious : how this dataimport
 works?

 Does it loads the whole table into memory?

 Is there any estimate about how much memory it needs to create index for 1GB
 of data.

 thx
 mani

 On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com
 wrote:

  Hi Shalin:
  yes i tried with batchSize=-1 parameter as well
 
  here the config i tried with
 
  dataConfig
 
     dataSource type=JdbcDataSource batchSize=-1 name=sp
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/mydb_development
  user=root password=** /
 
 
  I hope i have used batchSize parameter @ right place.
 
 
 Yes that is correct. Did it still throw OOM from the same place?

 I'd suggest you increase the heap and see what works for you. Also try
 -server on the jvm.

 --
 Regards,
 Shalin Shekhar Mangar.





-- 
--Noble Paul


Re: Question on StreamingUpdateSolrServer

2009-04-14 Thread vivek sar
The machine's ulimit is set to 9000 and the OS has upper limit of
12000 on files. What would explain this? Has anyone tried Solr with 25
cores on the same Solr instance?

Thanks,
-vivek

2009/4/13 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 On Tue, Apr 14, 2009 at 7:14 AM, vivek sar vivex...@gmail.com wrote:
 Some more update. As I mentioned earlier we are using multi-core Solr
 (up to 65 cores in one Solr instance with each core 10G). This was
 opening around 3000 file descriptors (lsof). I removed some cores and
 after some trial and error I found at 25 cores system seems to work
 fine (around 1400 file descriptors). Tomcat is responsive even when
 the indexing is happening at Solr (for 25 cores). But, as soon as it
 goes to 26 cores the Tomcat becomes unresponsive again. The puzzling
 thing is if I stop indexing I can search on even 65 cores, but while
 indexing is happening it seems to support only up to 25 cores.

 1) Is there a limit on number of cores a Solr instance can handle?
 2) Does Solr do anything to the existing cores while indexing? I'm
 writing to only one core at a time.
 There is no hard limit (it is Integer.MAX_VALUE) . But inreality your
 mileage depends on your hardware and no:of file handles the OS can
 open

 We are struggling to find why Tomcat stops responding on high number
 of cores while indexing is in-progress. Any help is very much
 appreciated.

 Thanks,
 -vivek

 On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote:
 Here is some more information about my setup,

 Solr - v1.4 (nightly build 03/29/09)
 Servlet Container - Tomcat 6.0.18
 JVM - 1.6.0 (64 bit)
 OS -  Mac OS X Server 10.5.6

 Hardware Overview:

 Processor Name: Quad-Core Intel Xeon
 Processor Speed: 3 GHz
 Number Of Processors: 2
 Total Number Of Cores: 8
 L2 Cache (per processor): 12 MB
 Memory: 20 GB
 Bus Speed: 1.6 GHz

 JVM Parameters (for Solr):

 export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP
 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
 -Dsun.rmi.dgc.client.gcInterval=360
 -Dsun.rmi.dgc.server.gcInterval=360

 Other:

 lsof|grep solr|wc -l
    2493

 ulimit -an
  open files                      (-n) 9000

 Tomcat
    Connector port=8080 protocol=HTTP/1.1
               connectionTimeout=2
               maxThreads=100 /

 Total Solr cores on same instance - 65

 useCompoundFile - true

 The tests I ran,

 While Indexer is running
 1)  Go to http://juum19.co.com:8080/solr;    - returns blank page (no
 error in the catalina.out)

 2) Try telnet juum19.co.com 8080  - returns with Connection closed
 by foreign host

 Stop the Indexer Program (Tomcat is still running with Solr)

 3)  Go to http://juum19.co.com:8080/solr;  - works ok, shows the list
 of all the Solr cores

 4) Try telnet - able to Telnet fine

 5)  Now comment out all the caches in solrconfig.xml. Try same tests,
 but the Tomcat still doesn't response.

 Is there a way to stop the auto-warmer. I commented out the caches in
 the solrconfig.xml but still see the following log,

 INFO: autowarming result for searc...@3aba3830 main
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

 INFO: Closing searc...@175dc1e2
 main    
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


 6) Change the Indexer frequency so it runs every 2 min (instead of all
 the time). I noticed once the commit is done, I'm able to run my
 searches. During commit and auto-warming period I just get blank page.

  7) Changed from Solrj to XML update -  I still get the blank page
 whenever update/commit is happening.

 Apr 13, 2009 6:46:18 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
 INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948


 So, looks like it's not just StreamingUpdateSolrServer, but whenever
 the update/commit is happening I'm not able to search. I don't know if
 it's related to using 

Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
you should construct the xml containing the fields defined in your
schema.xml and give them the values from the text files. for example if you
have an schema defining two fields title and text you should construct
an xml with a field title and its value and another called text
containing the body of your doc. then you can post it to Solr you have
deployed and make a commit an it's done. it's possible to construct an xml
defining more than jus t a doc


add
doc
field name=titledoc1 title/field
field name=textdoc1 text/field
/doc
.
.
.
doc
field name=titledocn title/field
field name=textdocn text/field
/doc
/add



2009/4/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 what is the cntent of your text file?
 Solr does not directly index files
 --Noble

 On Tue, Apr 14, 2009 at 3:54 AM, Alex Vu alex.v...@gmail.com wrote:
  Hi all,
 
  Currently I wrote an xml file and schema.xml file.  What is the next step
 to
  index a txt file?  Where should I put my txt file I want to index?
 
  thank you,
  Alex V.
 



 --
 --Noble Paul



Re: solr 1.4 memory jvm

2009-04-14 Thread sunnyfr

do you have an idea?


sunnyfr wrote:
 
 Hi Noble,
 
 Yes exactly that,
 I would like to know how people do during a replication ?
 Do they turn off servers and put a high autowarmCount which turn off the
 slave for a while like for my case, 10mn to bring back the new index and
 then autowarmCount maybe 10 minutes more.
 
 Otherwise I tried to put large number of mergefactor but I guess I've too
 much update every 30mn something like 2000docs and almost all segment are
 modified.
 
 What would you reckon? :(  :)
 
 Thanks a lot Noble 
 
 
 Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 So what I decipher from the numbers is w/o queries Solr replication is
 not performing too badly. The queries are inherently slow and you wish
 to optimize the query performance itself.
 am I correct?
 
 On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote:

 Hi,

 So I did two test on two servers;

 First server : with just replication every 20mn like you can notice:
 http://www.nabble.com/file/p22930179/cpu_without_request.png
 cpu_without_request.png
 http://www.nabble.com/file/p22930179/cpu2_without_request.jpg
 cpu2_without_request.jpg

 Second server : with one first replication and a second one during query
 test: between 15:32pm and 15h41
 during replication (checked on .../admin/replication/index.jsp) my
 respond
 time query at the end was around 5000msec
 after the replication I guess during commitment I couldn't get answer of
 my
 query for a long time, I refreshed my page few minutes after.
 http://www.nabble.com/file/p22930179/cpu_with_request.png
 cpu_with_request.png
 http://www.nabble.com/file/p22930179/cpu2_with_request.jpg
 cpu2_with_request.jpg

 Now without replication I kept going query on the second server, and I
 can't
 get better than
 1000msec repond time and 11request/second.
 http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg

 This is my request :
 select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5

 Do you have advice ?

 Thanks Noble


 --
 View this message in context:
 http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p23035520.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: commit / new searcher delay?

2009-04-14 Thread sunnyfr

Hi Hossman,

I would love to know either how do you manage this ? 

thanks,


Shalin Shekhar Mangar wrote:
 
 On Fri, Mar 6, 2009 at 8:47 AM, Steve Conover scono...@gmail.com wrote:
 
 That's exactly what I'm doing, but I'm explicitly replicating, and
 committing.  Even under these circumstances, what could explain the
 delay after commit before the new index becomes available?

 
 How are you explicitly replicating? I mean, how do you make sure that the
 slave has actually finished replication and the new index is available
 now?
 Are you using the script based replication or the new java based one?
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/commit---new-searcher-delay--tp22342916p23036207.html
Sent from the Solr - User mailing list archive at Nabble.com.



Boolean query in Solr

2009-04-14 Thread Sagar Khetkade

Hi,
I am using SolrJ and firing the query on Solr indexes. The indexed contains 
three fields viz.
1.   Document_id (type=integer required= true)
2.   Ticket Id (type= integer)
3.   Content (type=text)
 
Here the query formulation is such that I am having query with “AND” clause. So 
the query, that I am firing on index files look like  “Content: search query 
AND Ticket_id:123 Ticket_Id:789)”.
Here I am using the AND clause  which make my job easy through which I retrieve 
the document  having the query words in the “Content” field and the document is 
having Ticket_id field(123). 
 
I know this type of query is easily fired on lucene indexes. But when I am 
firing the above query I  am not getting the required result . The result 
contains the document which does not belongs to the ticket id mentioned in the 
query.
Please can anyone help me out of this issue.
 
Thanks in advance.
 
Regards,
Sagar Khetkade
_
Windows Live Messenger. Multitasking at its finest.
http://www.microsoft.com/india/windows/windowslive/messenger.aspx

Re: solr 1.4 memory jvm

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
We do not have such high update frequency. So we never encountered
this problem. If it is possible to take the slave offline during auto
warming that is a good solution.
--Noble

On Thu, Apr 9, 2009 at 2:02 PM, sunnyfr johanna...@gmail.com wrote:

 Hi Noble,

 Yes exactly that,
 I would like to know how people do during a replication ?
 Do they turn off servers and put a high autowarmCount which turn off the
 slave for a while like for my case, 10mn to bring back the new index and
 then autowarmCount maybe 10 minutes more.

 Otherwise I tried to put large number of mergefactor but I guess I've too
 much update every 30mn something like 2000docs and almost all segment are
 modified.

 What would you reckon? :(  :)

 Thanks a lot Noble


 Noble Paul നോബിള്‍  नोब्ळ् wrote:

 So what I decipher from the numbers is w/o queries Solr replication is
 not performing too badly. The queries are inherently slow and you wish
 to optimize the query performance itself.
 am I correct?

 On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote:

 Hi,

 So I did two test on two servers;

 First server : with just replication every 20mn like you can notice:
 http://www.nabble.com/file/p22930179/cpu_without_request.png
 cpu_without_request.png
 http://www.nabble.com/file/p22930179/cpu2_without_request.jpg
 cpu2_without_request.jpg

 Second server : with one first replication and a second one during query
 test: between 15:32pm and 15h41
 during replication (checked on .../admin/replication/index.jsp) my
 respond
 time query at the end was around 5000msec
 after the replication I guess during commitment I couldn't get answer of
 my
 query for a long time, I refreshed my page few minutes after.
 http://www.nabble.com/file/p22930179/cpu_with_request.png
 cpu_with_request.png
 http://www.nabble.com/file/p22930179/cpu2_with_request.jpg
 cpu2_with_request.jpg

 Now without replication I kept going query on the second server, and I
 can't
 get better than
 1000msec repond time and 11request/second.
 http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg

 This is my request :
 select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5

 Do you have advice ?

 Thanks Noble


 --
 View this message in context:
 http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul



 --
 View this message in context: 
 http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Search included in *all* fields

2009-04-14 Thread Erik Hatcher
Or in schema.xml you can set the defaultOperator to AND:  
solrQueryParser defaultOperator=AND/ which applies only to the  
Lucene/SolrQueryParser, not dismax.


Erik

On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote:


what about:
fieldA:value1 AND fieldB:value2

this can also be written as:
+fieldA:value1 +fieldB:value2


On Apr 13, 2009, at 9:53 PM, Johnny X wrote:



I'll start a new thread to make things easier, because I've only  
really got

one problem now.

I've configured my Solr to search on all fields, so it will only  
search for
a specific query in a specific field (e.g. q=Date:October) will  
only

search the 'Date' field, rather than all the others.

The issue is when you build up multiple fields to search on. Only  
one of
those has to match for a result to be returned, rather than all of  
them. Is

there a way to change this?


Cheers!
--
View this message in context: 
http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Use more then one document tag with Dataimporthandler ?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope,

but it is possible to have multiple root entities within a document
and you can execute one at a time.
--Noble


On Tue, Apr 14, 2009 at 4:15 PM, gateway0 reiterwo...@yahoo.de wrote:

 Hi,

 is it possible to use more than one document tag within my data-config.xml
 file?

 Like:

 dataConfig
 dataSource type=JdbcDataSource name=abc driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/my_zend_appz user=root password=/

  document name=first
    ...entities
  /document
  document name=second
    ...entities
  /document

 /dataConfig

 ???

 kind regards, Sebastian
 --
 View this message in context: 
 http://www.nabble.com/Use-more-then-one-%3Cdocument%3E-tag-with-Dataimporthandler---tp23037189p23037189.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Search included in *all* fields

2009-04-14 Thread Johnny X

Cheers guys, got it working!


Erik Hatcher wrote:
 
 Or in schema.xml you can set the defaultOperator to AND:  
 solrQueryParser defaultOperator=AND/ which applies only to the  
 Lucene/SolrQueryParser, not dismax.
 
   Erik
 
 On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote:
 
 what about:
 fieldA:value1 AND fieldB:value2

 this can also be written as:
 +fieldA:value1 +fieldB:value2


 On Apr 13, 2009, at 9:53 PM, Johnny X wrote:


 I'll start a new thread to make things easier, because I've only  
 really got
 one problem now.

 I've configured my Solr to search on all fields, so it will only  
 search for
 a specific query in a specific field (e.g. q=Date:October) will  
 only
 search the 'Date' field, rather than all the others.

 The issue is when you build up multiple fields to search on. Only  
 one of
 those has to match for a result to be returned, rather than all of  
 them. Is
 there a way to change this?


 Cheers!
 -- 
 View this message in context:
 http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23037645.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: indexing txt file

2009-04-14 Thread Erik Hatcher


On Apr 14, 2009, at 2:01 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



what is the cntent of your text file?
Solr does not directly index file


Solr's ExtractingRequestHandler (aka Solr Cell) does index text (and  
Word, PDF, etc) files directly.  This is a Solr 1.4/trunk feature.


Erik



Re: Using ExtractingRequestHandler to index a large PDF ~solved

2009-04-14 Thread Fergus McMenemie
On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote:

 Hmmm,

 Not sure how this all hangs together. But editing my solrconfig.xml  
 as follows
 sorted the problem:-

requestParsers enableRemoteStreaming=false  
 multipartUploadLimitInKB=2048 /
 to

requestParsers enableRemoteStreaming=false  
 multipartUploadLimitInKB=20048 /


We should document this on the wiki or in the config, if it isn't  
already.

As best I could tell it is not documented. I stumbled across
the idea of changing multipartUploadLimitInKB after reviewing 
http://wiki.apache.org/solr/UpdateRichDocuments. But this leads
onto wondering if streaming files from a local disk was in some
way also available via enableRemoteStreaming for the solr-cell
feature? With 20:20 hindsight I see that 
 http://wiki.apache.org/solr/SolrConfigXml does briefly refer
to file upload size

I feel that the requestDispatcher section of solrconfig.xml
needs a more complete description. I get the impression it
acts a filter on *any* URL sent to SOLR? What does it do?

I will mark up the wiki when this is clarified



 Also, my initial report of the issue was misled by the log messages.  
 The mention
 of oceania.pdf refers to a previous successful tika extract. There  
 no mention
 of the filename that was rejected in the logs or any information  
 that would help
 me identify it!

We should fix this so it at least spits out a meaningful message.  Can  
you open a JIRA?


OK SOLR-1113 raised.


 Regards Fergus.

 Sorry if this is a FAQ; I suspect it could be. But how do I work  
 around the following:-

 INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract  
 params={ext.def.fl=textext.literal.id=factbook/reference_maps/pdf/ 
 oceania.pdf} status=0 QTime=318
 Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.commons.fileupload.FileUploadBase 
 $SizeLimitExceededException: the request was rejected because its  
 size (4585774) exceeds the configured maximum (2097152)
 at org.apache.commons.fileupload.FileUploadBase 
 $FileItemIteratorImpl.init(FileUploadBase.java:914)
 at  
 org 
 .apache 
 .commons 
 .fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
 at  
 org 
 .apache 
 .commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java: 
 349)
 at  
 org 
 .apache 
 .commons 
 .fileupload 
 .servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
 at  
 org 
 .apache 
 .solr 
 .servlet 
 .MultipartRequestParser 
 .parseParamsAndFillStreams(SolrRequestParsers.java:343)
 at  
 org 
 .apache 
 .solr 
 .servlet 
 .StandardRequestParser 
 .parseParamsAndFillStreams(SolrRequestParsers.java:396)
 at  
 org 
 .apache 
 .solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
 at  
 org 
 .apache 
 .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
 217)
 at  
 org 
 .apache 
 .catalina 
 .core 
 .ApplicationFilterChain 
 .internalDoFilter(ApplicationFilterChain.java:202)
 at  
 org 
 .apache 
 .catalina 
 .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
 173)
 at  
 org 
 .apache 
 .catalina 
 .core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at  
 org 
 .apache 
 .catalina 
 .core.StandardContextValve.invoke(StandardContextValve.java:178)
 at  
 org 
 .apache 
 .catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
 at  
 org 
 .apache 
 .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)

 Although the PDF is big, it contains very little text; it is a map.

  java -jar solr/lib/tika-0.3.jar -g appears to have no bother  
 with it.

 Fergus...

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Customizing solr with my lucene

2009-04-14 Thread mirage1987

hey,
  I am trying to modify the lucene code by adding payload functionality
into it.
Now if i want to use this lucene with solr what should i do.
I have added this to the lib folder of solr.war replacing the old lucene..Is
this enough??
Plus i am also using a different schema than the default shema.xml used by
solr.(Added some fields and removed some of the previous ones).
The problem i am facing is that now the solr is not returning results but
the lucene individually is for the same query.
Could you help me on this...ny ideas n suggestions??
-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23038007.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Maintaining XML Layout

2009-04-14 Thread Johnny X

Pre tag fixed it instantly! 


Thanks!



Shalin Shekhar Mangar wrote:
 
 On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com
 wrote:
 

 Hey,


 One of the fields returned from my queries (Content) is essentially the
 body
 of an e-mail. However, it's returned as one long stream of text (or at
 least, that's how it appears on the web page). Viewing the source of the
 page it appears with the right layout characteristics (paragraphs, name
 at
 end of message separate from main message etc.)

 Is there anyway of making it appear this way on the web page, or is this
 just a browser specific thing?


 I think you'd need to convert line break characters in the returned string
 into equivalent html tags yourself before displaying. You could also try
 displaying them in a 'pre' tag and see if it looks ok.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Maintaining-XML-Layout-tp23037698p23038026.html
Sent from the Solr - User mailing list archive at Nabble.com.



Maintaining XML Layout

2009-04-14 Thread Johnny X

Hey,


One of the fields returned from my queries (Content) is essentially the body
of an e-mail. However, it's returned as one long stream of text (or at
least, that's how it appears on the web page). Viewing the source of the
page it appears with the right layout characteristics (paragraphs, name at
end of message separate from main message etc.)

Is there anyway of making it appear this way on the web page, or is this
just a browser specific thing?



Cheers!
-- 
View this message in context: 
http://www.nabble.com/Maintaining-XML-Layout-tp23037698p23037698.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Maintaining XML Layout

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com wrote:


 Hey,


 One of the fields returned from my queries (Content) is essentially the
 body
 of an e-mail. However, it's returned as one long stream of text (or at
 least, that's how it appears on the web page). Viewing the source of the
 page it appears with the right layout characteristics (paragraphs, name at
 end of message separate from main message etc.)

 Is there anyway of making it appear this way on the web page, or is this
 just a browser specific thing?


I think you'd need to convert line break characters in the returned string
into equivalent html tags yourself before displaying. You could also try
displaying them in a 'pre' tag and see if it looks ok.

-- 
Regards,
Shalin Shekhar Mangar.


Use more then one document tag with Dataimporthandler ?

2009-04-14 Thread gateway0

Hi,

is it possible to use more than one document tag within my data-config.xml
file?

Like:

dataConfig
dataSource type=JdbcDataSource name=abc driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/my_zend_appz user=root password=/

  document name=first
...entities
  /document
  document name=second
...entities
  /document

/dataConfig

???

kind regards, Sebastian
-- 
View this message in context: 
http://www.nabble.com/Use-more-then-one-%3Cdocument%3E-tag-with-Dataimporthandler---tp23037189p23037189.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Random queries extremely slow

2009-04-14 Thread sunnyfr

Hi Oleg

Did you find a way to pass over this issue ?? 
thanks a lot,


oleg_gnatovskiy wrote:
 
 Can you expand on this? Mirroring delay on what?
 
 
 
 zayhen wrote:
 
 Use multiple boxes, with a mirroring delaay from one to another, like a
 pipeline.
 
 2009/1/22 oleg_gnatovskiy oleg_gnatovs...@citysearch.com
 

 Well this probably isn't the cause of our random slow queries, but might
 be
 the cause of the slow queries after pulling a new index. Is there
 anything
 we could do to reduce the performance hit we take from this happening?



 Otis Gospodnetic wrote:
 
  Here is one example: pushing a large newly optimized index onto the
  server.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, January 22, 2009 2:22:51 PM
  Subject: Re: Random queries extremely slow
 
 
  What are some things that could happen to force files out of the
 cache
 on
  a
  Linux machine? I don't know what kinds of events to look for...
 
 
 
 
  yonik wrote:
  
   On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy
   wrote:
   Hello. Our production servers are operating relatively smoothly
 most
  of
   the
   time running Solr with 19 million listings. However every once in
 a
  while
   the same query that used to take 100 miliseconds takes 6000.
  
   Anything else happening on the system that may have forced some of
 the
   index files out of operating system disk cache at these times?
  
   -Yonik
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Random-queries-extremely-slow-tp21610568p23039151.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Boolean query in Solr

2009-04-14 Thread Erik Hatcher


On Apr 14, 2009, at 5:38 AM, Sagar Khetkade wrote:



Hi,
I am using SolrJ and firing the query on Solr indexes. The indexed  
contains three fields viz.

1.   Document_id (type=integer required= true)
2.   Ticket Id (type= integer)
3.   Content (type=text)

Here the query formulation is such that I am having query with “AND”  
clause. So the query, that I am firing on index files look like   
“Content: search query AND Ticket_id:123 Ticket_Id:789)”.


That query is invalid query parser syntax, with an unopen paren first  
of all.  I assume that's a typo though.  Be careful in how you  
construct queries with field selectors.  Saying:


Content:search query

does NOT necessarily mean that the term query is being searched in  
the Content field, as that depends on your default field setting for  
the query parser.  This, however, does use the Content field for both  
terms:


   Content:(search query)


I know this type of query is easily fired on lucene indexes. But  
when I am firing the above query I  am not getting the required  
result . The result contains the document which does not belongs to  
the ticket id mentioned in the query.

Please can anyone help me out of this issue.


What does the query parse to with debugQuery output?   That's mighty  
informative info.


Erik



Re: Customizing solr with my lucene

2009-04-14 Thread Erik Hatcher
What is the query parsed to?   Add debugQuery=true to your Solr  
request and let us know what the query parses to.


As for whether upgrading a Lucene library is sufficient... depends on  
what Solr version you're starting with (payload support is already in  
all recent versions of Solr's Lucene JARs) and what has changed in  
Lucene since, and whether you're expecting an existing index to work  
or rebuilding it from scratch.


Erik

On Apr 14, 2009, at 7:51 AM, mirage1987 wrote:



hey,
 I am trying to modify the lucene code by adding payload  
functionality

into it.
Now if i want to use this lucene with solr what should i do.
I have added this to the lib folder of solr.war replacing the old  
lucene..Is

this enough??
Plus i am also using a different schema than the default shema.xml  
used by

solr.(Added some fields and removed some of the previous ones).
The problem i am facing is that now the solr is not returning  
results but

the lucene individually is for the same query.
Could you help me on this...ny ideas n suggestions??
--
View this message in context: 
http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23038007.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: synchronizing slave indexes in distributing collections

2009-04-14 Thread sunnyfr

Hi,

I would like to know where are you about your script which take the slave
out of the load balancer ?? 
I've no choice to do that during update on the slave server.

Thanks,


Yu-Hui Jin wrote:
 
 Thanks, guys.
 
 Glad to know the scripts work very well in your experience. (well, indeed
 they are quite simple.) So that's how I imagine we should do it except
 that
 you guys added a very good point -- that the monitoring system can invoke
 a
 script to take the slave out of the load balancer.  I'd like to implement
 this idea.
 
 
 Cheers,
 
 -Hui
 
 On 8/17/07, Bill Au bill.w...@gmail.com wrote:

 If snapinstaller fails to install the lastest snapshot, then chances are
 that it would be able to install any earlier snapshots as well.  All it
 does
 is some very simple filesystem operations and then invoke the Solr server
 to
 do a commit.  I agree with Chris that the best thing to do is to take it
 out
 of rotation and fix the underlying problem.

 Bill

 On 8/17/07, Chris Hostetter hossman_luc...@fucit.org wrote:
 
 
  : So looks like all we can do is it monitoring the logs and alarm
 people
  to
  : fix the issue and rerun the scripts, etc. whenever failures occur. Is
  that
  : the correct understanding?
 
  I have *never* seen snappuller or snapinstaller fail (except during an
  initial rollout of Solr when i forgot to setup the neccessary ssh
 keys).
 
  I suppose we could at an option to snapinstaller to support explicitly
  installing a snapshot by name ... then if you detect that salve Z
 didn't
  load the latest snapshot, you could always tell the other slaves to
  snapinstall whatever older version slave Z is still using -- but
 frankly
  that seems a little silly -- not to mention that if you couldn't load
 the
  snapshot into Z, odds are Z isn't responding to queries either.
 
  a better course of action might just be to have an automated system
 which
  monitors the distribution status info on the master, and takes any
 slaves
  that don't update it properly out of your load balances rotation (and
  notifies people to look into it)
 
 
 
  -Hoss
 
 

 
 
 
 -- 
 Regards,
 
 -Hui
 
 

-- 
View this message in context: 
http://www.nabble.com/synchronizing-slave-indexes-in-distributing-collections-tp12194297p23039732.html
Sent from the Solr - User mailing list archive at Nabble.com.



Disable logging in SOLR

2009-04-14 Thread Kraus, Ralf | pixelhouse GmbH

Hi,

is there a way to disable all logging output in SOLR ?
I mean the output text like :

INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0 
QTime=3736


greets -Ralf-



RE: Term Counts/Term Frequency Vector Info

2009-04-14 Thread Fink, Clayton R.
Grant,

This works:

String url = http://localhost:8983/solr;;
SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery();
query.setQueryType(/autoSuggest);
query.setParam(terms, true);
query.setParam(terms.fl, CONTENTS);
query.setParam(terms.lower, london);
query.setParam(terms.upper, london);
query.setParam(terms.upper.incl, true);

For the query:

http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSterms.lower=londonterms.upper=londonterms.upper.incl=true

It turned out that I was missing the leading / in /autoSuggest.  This needs 
to be explicit in the documentation.


Thanks!

Clay 

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Monday, April 13, 2009 3:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Term Counts/Term Frequency Vector Info

Sorry, should have add that you should set the qt param: 
http://wiki.apache.org/solr/CoreQueryParameters#head-2c940d42ec4f2a74c5d251f12f4077e53f2f00f4

-Grant

On Apr 13, 2009, at 1:35 PM, Fink, Clayton R. wrote:

 The query method seems to only support solr/select requests. I 
 subclassed SolrRequest and created a request class that supports 
 solr/autoSuggest - following the pattern in LukeRequest. It seems to 
 work fine for me.

 Clay

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, April 07, 2009 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Term Counts/Term Frequency Vector Info

 You can send arbitrary requests via SolrJ, just use the parameter map 
 via the query method: 
 http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/S
 olrServer.html
 .

 -Grant

 On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote:

 These URLs give me what I want - word completion and term counts.
 What I don't see is a way to call these via SolrJ. I could call the 
 server directly using java.net classes and process the XML myself, I 
 guess. There needs to be an auto suggest request class.

 http://localhost:8983/solr/autoSuggest? 
 terms=trueterms.fl=CONTENTSte
 rms.lower=Londterms.prefix=Lonindent=true

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 −
 lst name=terms
 −
 lst name=CONTENTS
 int name=London11/int
 int name=Londoners2/int
 /lst
 /lst
 /response

 http://localhost:8983/solr/autoSuggest? 
 terms=trueterms.fl=CONTENTSte
 rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 −
 lst name=terms
 −
 lst name=CONTENTS
 int name=London11/int
 /lst
 /lst
 /response

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Monday, April 06, 2009 5:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Term Counts/Term Frequency Vector Info

 See also http://wiki.apache.org/solr/TermsComponent

 You might be able to apply these patches to 1.3 and have them work, 
 but there is no guarantee.  You also can get some termDocs like 
 capabilities through Solr's faceting capabilities, but I am not aware 
 of any way to get at the term vector capabilities.

 HTH,
 Grant

 On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote:

 I want the functionality that Lucene IndexReader.termDocs gives me.
 That or access on the document level to the term vector. This 
 (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term
 )|(vector) seems to suggest that this will be available in 1.4. Is 
 there any way to do this in 1.3?

 Thanks,

 Clay


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search 


Embedded Solr weird behaviour

2009-04-14 Thread Adrian Ivan

Hello,

I am using both Solr server and Solr embedded versions in the same context.

I am using the Solr Server for indexing data which can be accessed at 
enterprise level, and the embedded version in a desktop application.


The idea is that both index the same data, have the same schema.xml and 
config.


My problem: when querying both versions I get different results for this 
case:


query=adventure AND category:Publishing Industry

Please note that 'Publishing Industry is actually composed by 2 words.

For the Server version it works very well, for the Embedded version, I 
get no result.


In this case:
query=adventure AND category:Book - I get correct results with both 
version.


category is a field type in my schema.

I noticed that when I have something like: AND category:'composed 
words', the Embedded version fails.


In the schema I tried making the category fieldType as text, string, 
etc, but no results


any suggestion would be very appreciated.

Thanks,
Adrian



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
Dang, had another server do this.

Syncing and committing a new index does not fix it. The two servers
show the same bad results.

wunder

On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote:

 Restarting Solr fixes it. If I remember correctly, a sync and commit
 does not fix it. I have disabled snappuller this time, so I can study
 the broken instance.
 
 wunder
 
 On 4/11/09 5:03 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 
 On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote:
 
 Normally, both changeling and the changeling work fine. This one
 server is misbehaving like this for all multi-term queries.
 
 Yes, it is VERY weird that the term changeling does not show up in
 the explain.
 
 A server will occasionally go bad and stay in that state. In one
 case,
 two servers went bad and both gave the same wrong results.
 
 
 What's the solution for when they go bad?  Do you have to restart Solr
 or reboot or what?
 
 
 Here is the dismax config. groups means movies. The title* fields
 are stemmed and stopped, the exact* fields are not.
 
  !-- groups and people  --
 
  requestHandler name=groups_people class=solr.SearchHandler
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsnone/str
 float name=tie0.01/float
 str name=qf
exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
 title^3.0 title_alt^3.0 title_base^4.0
 /str
 
 str name=pf
exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0
 title^3.0
 title_alt^4.0 title_base^6.0
 /str
 str name=bf
search_popularity^100.0
 /str
 str name=mm1/str
 int name=ps100/int
 str name=flid,type,movieid,personid,genreid/str
 
/lst
lst name=appends
  str name=fqtype:group OR type:person/str
/lst
  /requestHandler
 
 
 wunder
 
 On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote:
 
 
 On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
 
 We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
 and
 I would appreciate any ideas.
 
 Ocassionally, a server will start returning results with really poor
 relevance. Single term queries work fine, but multi-term queries are
 scored based on the most common term (lowest IDF).
 
 I don't see anything in the logs when this happens. We have a
 monitor
 doing a search for the 100 most popular movies once per minute to
 catch this, so we know when it was first detected.
 
 I'm attaching two explain outputs, one for the query changeling
 and
 one for the changeling.
 
 
 I'm not sure what exactly  you are asking, so bear with me...
 
 Are you saying that the changeling normally returns results just
 fine and then periodically it will go bad or are you saying you
 don't understand why the changeling scores differently from
 changeling?  In looking at the explains, it is weird that in the
 the changeling case, the term changeling doesn't even show up as a
 term.
 
 Can you share your dismax configuration?  That will be easier to
 parse
 than trying to make sense of the debug query parsing.
 
 -Grant
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 



Re: Memory usage

2009-04-14 Thread Mark Miller

Could you give us a dump of  http://localhost:port/solr/admin/luke ?

A huge max field length and random terms in 2000 2 MB files is going to 
be a bit of a resource hog :)


Can you explain why you are doing that? You will have *so* many unique 
terms...


I can't remember if you can set it in Solr, but there is a way to lessen 
how much RAM terms take in Lucene though (term interval I believe?).


- Mark

Gargate, Siddharth wrote:

Hi all,
I am testing indexing with 2000 text documents of size 2 MB
each. These documents contain words created with random characters. I
observed that the tomcat memory usage goes on increasing slowly. I tried
by removing all the cache configuration, but still memory usage
increases. Once the memory reaches to max heap specified, commit looks
like blocked until the memory is freed. With larger documents, I see
some OOMEs
Below are few properties set in solrconfig.xml

mainIndex
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB128/ramBufferSizeMB
mergeFactor25/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength2147483647/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

   lockTypesingle/lockType
   unlockOnStartupfalse/unlockOnStartup
/mainIndex 
autoCommit 
  maxDocs1/maxDocs
  maxTime7000/maxTime 
/autoCommit

useColdSearcherfalse/useColdSearcher
maxWarmingSearchers10/maxWarmingSearchers

Where does the memory get used? And how to avoid it?

Thanks,
Siddharth

  



--
- Mark

http://www.lucidimagination.com





Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Yonik Seeley
It just occurred to me that a query cache issue could potentially
cause this... if it's caching it would most likely be a query.equals()
implementation incorrectly returning true.
Perhaps check the JaroWinkler.equals() first?

Also, when one server starts to return bad results, have you tried
using explainOther=id:id_of_other_doc_that_should_score_higher?

-Yonik
http://www.lucidimagination.com


On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood
wunderw...@netflix.com wrote:
 Dang, had another server do this.

 Syncing and committing a new index does not fix it. The two servers
 show the same bad results.

 wunder

 On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote:

 Restarting Solr fixes it. If I remember correctly, a sync and commit
 does not fix it. I have disabled snappuller this time, so I can study
 the broken instance.

 wunder


Re: indexing txt file

2009-04-14 Thread Alex Vu
Hi all,
I'm trying to use solr1.3 and trying to index a text file.  I wrote a
schema.xsd and a xml file.

*The content of my text file is *
#src   dstprotook
sportdportpktsbytesflowsfirst
atest
192.168.220.13526.147.238.1466  13283980
6  463  1  1237333861.4657640001237333861.664701000

*schema file is *
?xml version=1.0 encoding=UTF-8?
!--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
xs:element name=networkTraffic
xs:complexType
xs:sequence
xs:element name=packet maxOccurs=unbounded
xs:complexType
xs:attribute name=terminationTimestamp
type=xs:string use=required/
xs:attribute name=sourcePort type=xs:string
use=required/
xs:attribute name=sourceIp type=xs:string
use=required/
xs:attribute name=protocolPortNumber
type=xs:string use=required/
xs:attribute name=packets type=xs:string
use=required/
xs:attribute name=ok type=xs:string
use=required/
xs:attribute name=initialTimestamp
type=xs:string use=required/
xs:attribute name=flows type=xs:string
use=required/
xs:attribute name=destinatoinIp type=xs:string
use=required/
xs:attribute name=destinationPort
type=xs:string use=required/
xs:attribute name=bytes type=xs:string
use=required/
/xs:complexType
/xs:element
/xs:sequence
/xs:complexType
/xs:element
/xs:schema


*and my xml file is *

?xml version=1.0 encoding=UTF-8?
networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd
packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1
protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1
protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1
protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1
protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1
protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1
protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1
protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1
protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
terminationTimestamp=1237963861.664701000/
/networkTraffic



Can someone please show me where do I put these files?  I'm aware that the
schema.xsd file goes into the directory conf. What about my xml file, and
txt file?

Thank you,
Alex


On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez 
alejandrogonzalezd...@gmail.com wrote:

 you should construct the xml containing the fields defined in your
 schema.xml and give them the values from the text files. for example if you
 have an schema defining two fields title and text you should construct
 an xml with a field title and its value and another called text
 containing the body of your doc. then you can post it to Solr you have
 deployed and make a commit an it's done. it's possible to construct an xml
 defining more than jus t a doc


 add
 doc
 field name=titledoc1 title/field
 field name=textdoc1 text/field
 /doc
 .
 .
 .
 doc
 field name=titledocn title/field
 field name=textdocn text/field
 /doc
 /add



 2009/4/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

  what is the cntent of your text file?
  Solr does not directly index files
  --Noble
 
  On Tue, Apr 14, 2009 at 3:54 AM, 

DIH uniqueKey

2009-04-14 Thread ashokc

Hi,

I have separate JDBC datasources (DS1  DS2) that I want to index with DIH
in a single SOLR instance. The unique record for the two sources are
different. Do I have to synthesize a uniqueKey that spans both the
datasources? Something like this? That is, the uniqueKey values will be like
(+ indicating concatenation):

DS1 + primary key for DS1

DS2 + primary key for DS2

Thanks
- ashok
-- 
View this message in context: 
http://www.nabble.com/DIH---uniqueKey-tp23042732p23042732.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
The JaroWinkler equals was broken, but I fixed that a month ago.

Query cache sounds possible, but those are cleared on a commit,
right?

I could run with a cache size of 0, since our middle tier HTTP
cache is leaving almost nothing for the caches to do.

I'll try that explain. The stored fields for the correct doc
are fine, because I can see them when I use a single-term query.
The indexed fields seem OK, because that query works.

wunder

On 4/14/09 9:11 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 It just occurred to me that a query cache issue could potentially
 cause this... if it's caching it would most likely be a query.equals()
 implementation incorrectly returning true.
 Perhaps check the JaroWinkler.equals() first?
 
 Also, when one server starts to return bad results, have you tried
 using explainOther=id:id_of_other_doc_that_should_score_higher?
 
 -Yonik
 http://www.lucidimagination.com
 
 
 On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood
 wunderw...@netflix.com wrote:
 Dang, had another server do this.
 
 Syncing and committing a new index does not fix it. The two servers
 show the same bad results.
 
 wunder
 
 On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote:
 
 Restarting Solr fixes it. If I remember correctly, a sync and commit
 does not fix it. I have disabled snappuller this time, so I can study
 the broken instance.
 
 wunder



Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
now you should post (http post) your xml file (the schema must be in conf
folder) to the url in wich it's supossed you have deployed Solr. Don forget
to post a commit command after that or you won't see the results:

The commit command it's just an xml this way:

commit/commit

On Tue, Apr 14, 2009 at 6:14 PM, Alex Vu alex.v...@gmail.com wrote:

 Hi all,
 I'm trying to use solr1.3 and trying to index a text file.  I wrote a
 schema.xsd and a xml file.

 *The content of my text file is *
 #src   dstprotook
 sportdportpktsbytesflowsfirst
atest
 192.168.220.13526.147.238.1466  13283980
 6  463  1  1237333861.4657640001237333861.664701000

 *schema file is *
 ?xml version=1.0 encoding=UTF-8?
 !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
 xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
xs:element name=networkTraffic
xs:complexType
xs:sequence
xs:element name=packet maxOccurs=unbounded
xs:complexType
xs:attribute name=terminationTimestamp
 type=xs:string use=required/
xs:attribute name=sourcePort type=xs:string
 use=required/
xs:attribute name=sourceIp type=xs:string
 use=required/
xs:attribute name=protocolPortNumber
 type=xs:string use=required/
xs:attribute name=packets type=xs:string
 use=required/
xs:attribute name=ok type=xs:string
 use=required/
xs:attribute name=initialTimestamp
 type=xs:string use=required/
xs:attribute name=flows type=xs:string
 use=required/
xs:attribute name=destinatoinIp type=xs:string
 use=required/
xs:attribute name=destinationPort
 type=xs:string use=required/
xs:attribute name=bytes type=xs:string
 use=required/
/xs:complexType
/xs:element
/xs:sequence
/xs:complexType
/xs:element
 /xs:schema


 *and my xml file is *

 ?xml version=1.0 encoding=UTF-8?
 networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;

 xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd
packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1
 protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1
 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1
 protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1
 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1
 protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1
 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1
 protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1
 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
 terminationTimestamp=1237963861.664701000/
 /networkTraffic



 Can someone please show me where do I put these files?  I'm aware that the
 schema.xsd file goes into the directory conf. What about my xml file, and
 txt file?

 Thank you,
 Alex


 On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez 
 alejandrogonzalezd...@gmail.com wrote:

  you should construct the xml containing the fields defined in your
  schema.xml and give them the values from the text files. for example if
 you
  have an schema defining two fields title and text you should
 construct
  an xml with a field title and its value and another called text
  containing the body of your doc. then you can post it to Solr you have
  deployed and make a commit an it's done. it's possible 

Re: indexing txt file

2009-04-14 Thread Alex Vu
what about the text file?

On Tue, Apr 14, 2009 at 9:23 AM, Alejandro Gonzalez 
alejandrogonzalezd...@gmail.com wrote:

 now you should post (http post) your xml file (the schema must be in conf
 folder) to the url in wich it's supossed you have deployed Solr. Don forget
 to post a commit command after that or you won't see the results:

 The commit command it's just an xml this way:

 commit/commit

 On Tue, Apr 14, 2009 at 6:14 PM, Alex Vu alex.v...@gmail.com wrote:

  Hi all,
  I'm trying to use solr1.3 and trying to index a text file.  I wrote a
  schema.xsd and a xml file.
 
  *The content of my text file is *
  #src   dstprotook
  sportdportpktsbytesflowsfirst
 atest
  192.168.220.13526.147.238.1466  13283980
  6  463  1  1237333861.4657640001237333861.664701000
 
  *schema file is *
  ?xml version=1.0 encoding=UTF-8?
  !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
  xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
 xs:element name=networkTraffic
 xs:complexType
 xs:sequence
 xs:element name=packet maxOccurs=unbounded
 xs:complexType
 xs:attribute name=terminationTimestamp
  type=xs:string use=required/
 xs:attribute name=sourcePort type=xs:string
  use=required/
 xs:attribute name=sourceIp type=xs:string
  use=required/
 xs:attribute name=protocolPortNumber
  type=xs:string use=required/
 xs:attribute name=packets type=xs:string
  use=required/
 xs:attribute name=ok type=xs:string
  use=required/
 xs:attribute name=initialTimestamp
  type=xs:string use=required/
 xs:attribute name=flows type=xs:string
  use=required/
 xs:attribute name=destinatoinIp
 type=xs:string
  use=required/
 xs:attribute name=destinationPort
  type=xs:string use=required/
 xs:attribute name=bytes type=xs:string
  use=required/
 /xs:complexType
 /xs:element
 /xs:sequence
 /xs:complexType
 /xs:element
  /xs:schema
 
 
  *and my xml file is *
 
  ?xml version=1.0 encoding=UTF-8?
  networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 
 
 xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd
 packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1
  protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1
  protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1
  protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1
  protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1
  protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1
  protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1
  protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
 packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1
  protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80
  packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000
  terminationTimestamp=1237963861.664701000/
  /networkTraffic
 
 
 
  Can someone please show me where do I put these files?  I'm aware that
 the
  schema.xsd file goes into the directory conf. What about my xml file, and
  txt file?
 
  Thank you,
  Alex
 
 
  On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez 
  alejandrogonzalezd...@gmail.com wrote:
 
   you should construct the xml containing the fields defined in your
   schema.xml and give them the values from the text files. for example if
  you
   have an schema defining two 

Re: indexing txt file

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote:


 *schema file is *
 ?xml version=1.0 encoding=UTF-8?
 !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
 xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
xs:element name=networkTraffic
xs:complexType
xs:sequence
xs:element name=packet maxOccurs=unbounded
xs:complexType
xs:attribute name=terminationTimestamp
 type=xs:string use=required/
xs:attribute name=sourcePort type=xs:string
 use=required/
xs:attribute name=sourceIp type=xs:string
 use=required/
xs:attribute name=protocolPortNumber
 type=xs:string use=required/
xs:attribute name=packets type=xs:string
 use=required/
xs:attribute name=ok type=xs:string
 use=required/
xs:attribute name=initialTimestamp
 type=xs:string use=required/
xs:attribute name=flows type=xs:string
 use=required/
xs:attribute name=destinatoinIp type=xs:string
 use=required/
xs:attribute name=destinationPort
 type=xs:string use=required/
xs:attribute name=bytes type=xs:string
 use=required/
/xs:complexType
/xs:element
/xs:sequence
/xs:complexType
/xs:element
 /xs:schema


 Can someone please show me where do I put these files?  I'm aware that the
 schema.xsd file goes into the directory conf. What about my xml file, and
 txt file?


Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file
which describes the fields, their analyzers, tokenizers, copyFields, default
search field etc.

Look into the example schema supplied by Solr (inside example/solr/conf)
directory and modify it according to your needs.

-- 
Regards,
Shalin Shekhar Mangar.


Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
and i'm not sure of understanding what are u trying to do, but maybe you
should define a text field and fill it with the text in each file for
indexing the text in them, or maybe a path to that file if that's what u
want.

On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote:

 
  *schema file is *
  ?xml version=1.0 encoding=UTF-8?
  !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
  xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
 xs:element name=networkTraffic
 xs:complexType
 xs:sequence
 xs:element name=packet maxOccurs=unbounded
 xs:complexType
 xs:attribute name=terminationTimestamp
  type=xs:string use=required/
 xs:attribute name=sourcePort type=xs:string
  use=required/
 xs:attribute name=sourceIp type=xs:string
  use=required/
 xs:attribute name=protocolPortNumber
  type=xs:string use=required/
 xs:attribute name=packets type=xs:string
  use=required/
 xs:attribute name=ok type=xs:string
  use=required/
 xs:attribute name=initialTimestamp
  type=xs:string use=required/
 xs:attribute name=flows type=xs:string
  use=required/
 xs:attribute name=destinatoinIp
 type=xs:string
  use=required/
 xs:attribute name=destinationPort
  type=xs:string use=required/
 xs:attribute name=bytes type=xs:string
  use=required/
 /xs:complexType
 /xs:element
 /xs:sequence
 /xs:complexType
 /xs:element
  /xs:schema
 
 
  Can someone please show me where do I put these files?  I'm aware that
 the
  schema.xsd file goes into the directory conf. What about my xml file, and
  txt file?
 

 Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file
 which describes the fields, their analyzers, tokenizers, copyFields,
 default
 search field etc.

 Look into the example schema supplied by Solr (inside example/solr/conf)
 directory and modify it according to your needs.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: indexing txt file

2009-04-14 Thread Alex Vu
I also wrote another schema file that is supplied by Solr, I do have some
questions.
*The content of my text file is *
#src   dstprotook
sportdportpktsbytesflowsfirst
   latest
192.168.220.13526.147.238.1466  13283980
6  463  1  1237333861.4657640001237333861.664701000

I chose my:
*1. fieldType to be* : tint, tfloat, tlong, tdouble
*2. tokenizer class*: solr.WhiteSpaceTokenizerFactory,
solr.StandardTokenizerFactory,  solr.HTMLStripWhitespaceTokenizerFactory
*3. filter class : *solr.LengthFilterFactory, solr.TrimFilterFactory
*4. filed name:* src, dst, proto, ok, sport, deport, poks bytes, flow,
first, and latest
*5. uniqueKey:* src, dst

Are these modification legal accordingly to my text file?
Also, if I put this schema.xml file to conf,  what do I do with my text
file?

Thank you,
Nga P.






On Tue, Apr 14, 2009 at 9:28 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote:

 
  *schema file is *
  ?xml version=1.0 encoding=UTF-8?
  !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
  xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
 xs:element name=networkTraffic
 xs:complexType
 xs:sequence
 xs:element name=packet maxOccurs=unbounded
 xs:complexType
 xs:attribute name=terminationTimestamp
  type=xs:string use=required/
 xs:attribute name=sourcePort type=xs:string
  use=required/
 xs:attribute name=sourceIp type=xs:string
  use=required/
 xs:attribute name=protocolPortNumber
  type=xs:string use=required/
 xs:attribute name=packets type=xs:string
  use=required/
 xs:attribute name=ok type=xs:string
  use=required/
 xs:attribute name=initialTimestamp
  type=xs:string use=required/
 xs:attribute name=flows type=xs:string
  use=required/
 xs:attribute name=destinatoinIp
 type=xs:string
  use=required/
 xs:attribute name=destinationPort
  type=xs:string use=required/
 xs:attribute name=bytes type=xs:string
  use=required/
 /xs:complexType
 /xs:element
 /xs:sequence
 /xs:complexType
 /xs:element
  /xs:schema
 
 
  Can someone please show me where do I put these files?  I'm aware that
 the
  schema.xsd file goes into the directory conf. What about my xml file, and
  txt file?
 

 Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file
 which describes the fields, their analyzers, tokenizers, copyFields,
 default
 search field etc.

 Look into the example schema supplied by Solr (inside example/solr/conf)
 directory and modify it according to your needs.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Random queries extremely slow

2009-04-14 Thread oleg_gnatovskiy

It was actually our use of the field collapse patch. Once we disabled this
the random slow queries went away. 

We also added *:* as a warmup query in order to speed up performance after
indexing.



sunnyfr wrote:
 
 Hi Oleg
 
 Did you find a way to pass over this issue ?? 
 thanks a lot,
 
 
 oleg_gnatovskiy wrote:
 
 Can you expand on this? Mirroring delay on what?
 
 
 
 zayhen wrote:
 
 Use multiple boxes, with a mirroring delaay from one to another, like a
 pipeline.
 
 2009/1/22 oleg_gnatovskiy oleg_gnatovs...@citysearch.com
 

 Well this probably isn't the cause of our random slow queries, but
 might be
 the cause of the slow queries after pulling a new index. Is there
 anything
 we could do to reduce the performance hit we take from this happening?



 Otis Gospodnetic wrote:
 
  Here is one example: pushing a large newly optimized index onto the
  server.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, January 22, 2009 2:22:51 PM
  Subject: Re: Random queries extremely slow
 
 
  What are some things that could happen to force files out of the
 cache
 on
  a
  Linux machine? I don't know what kinds of events to look for...
 
 
 
 
  yonik wrote:
  
   On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy
   wrote:
   Hello. Our production servers are operating relatively smoothly
 most
  of
   the
   time running Solr with 19 million listings. However every once in
 a
  while
   the same query that used to take 100 miliseconds takes 6000.
  
   Anything else happening on the system that may have forced some of
 the
   index files out of operating system disk cache at these times?
  
   -Yonik
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Random-queries-extremely-slow-tp21610568p23043152.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH uniqueKey

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
use TemplateTransformer to create a key

On Tue, Apr 14, 2009 at 9:49 PM, ashokc ash...@qualcomm.com wrote:

 Hi,

 I have separate JDBC datasources (DS1  DS2) that I want to index with DIH
 in a single SOLR instance. The unique record for the two sources are
 different. Do I have to synthesize a uniqueKey that spans both the
 datasources? Something like this? That is, the uniqueKey values will be like
 (+ indicating concatenation):

 DS1 + primary key for DS1

 DS2 + primary key for DS2

 Thanks
 - ashok
 --
 View this message in context: 
 http://www.nabble.com/DIH---uniqueKey-tp23042732p23042732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: indexing txt file

2009-04-14 Thread Alex Vu
I just want to be able to index my text file, and other files that carries
the same format but with different IP address, ports, ect.

 I will have the traffic flow running in real-time.  Do you think Solr will
be able to index a bunch of my text files in real time?

On Tue, Apr 14, 2009 at 9:35 AM, Alejandro Gonzalez 
alejandrogonzalezd...@gmail.com wrote:

 and i'm not sure of understanding what are u trying to do, but maybe you
 should define a text field and fill it with the text in each file for
 indexing the text in them, or maybe a path to that file if that's what u
 want.

 On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote:
 
  
   *schema file is *
   ?xml version=1.0 encoding=UTF-8?
   !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com
 )--
   xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
  xs:element name=networkTraffic
  xs:complexType
  xs:sequence
  xs:element name=packet maxOccurs=unbounded
  xs:complexType
  xs:attribute name=terminationTimestamp
   type=xs:string use=required/
  xs:attribute name=sourcePort type=xs:string
   use=required/
  xs:attribute name=sourceIp type=xs:string
   use=required/
  xs:attribute name=protocolPortNumber
   type=xs:string use=required/
  xs:attribute name=packets type=xs:string
   use=required/
  xs:attribute name=ok type=xs:string
   use=required/
  xs:attribute name=initialTimestamp
   type=xs:string use=required/
  xs:attribute name=flows type=xs:string
   use=required/
  xs:attribute name=destinatoinIp
  type=xs:string
   use=required/
  xs:attribute name=destinationPort
   type=xs:string use=required/
  xs:attribute name=bytes type=xs:string
   use=required/
  /xs:complexType
  /xs:element
  /xs:sequence
  /xs:complexType
  /xs:element
   /xs:schema
  
  
   Can someone please show me where do I put these files?  I'm aware that
  the
   schema.xsd file goes into the directory conf. What about my xml file,
 and
   txt file?
  
 
  Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml
 file
  which describes the fields, their analyzers, tokenizers, copyFields,
  default
  search field etc.
 
  Look into the example schema supplied by Solr (inside example/solr/conf)
  directory and modify it according to your needs.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
I was wondering if those more up on SolrJ internals could take a look
if there were any serious gotchas with the AppEngine's Java urlfetch
with respect to SolrJ.

http://code.google.com/appengine/docs/java/urlfetch/overview.html
The URL must use the standard ports for HTTP (80) and HTTPS (443).
The port is implied by the scheme, but may also be mentioned in the
URL as long as the port is standard for the scheme (https://...:443/).
An app cannot connect to an arbitrary port of a remote host, nor can
it use a non-standard port for a scheme.

This is an annoyance for those running Solr on non-80/443. To some,
this may be a fatal limitation.

There is a 1M upload/download limit, which would impact large adds to
the index and large results sets back from the index.
There are also other quotas:
http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits

Otherwise, my eyes see no other major issues. Others?

thanks,

Glen

-- 

-


Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Yonik Seeley
On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
wunderw...@netflix.com wrote:
 The JaroWinkler equals was broken, but I fixed that a month ago.

 Query cache sounds possible, but those are cleared on a commit,
 right?

Yes, but if you use autowarming, those items are regenerated and if
there is a problem with equals() then it could re-appear (the cache
items are correct, it's just the lookup that returns the wrong one).

-Yonik
http://www.lucidimagination.com


Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
But why would it work for a few days, then go bad and stay bad?

It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.

We do use autowarming.

wunder

On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
 wunderw...@netflix.com wrote:
 The JaroWinkler equals was broken, but I fixed that a month ago.
 
 Query cache sounds possible, but those are cleared on a commit,
 right?
 
 Yes, but if you use autowarming, those items are regenerated and if
 there is a problem with equals() then it could re-appear (the cache
 items are correct, it's just the lookup that returns the wrong one).
 
 -Yonik
 http://www.lucidimagination.com



solr 1.3 + tomcat 5.5

2009-04-14 Thread andrysha nihuhoid
Hi, got problem setting up solr + tomcat
Tomcat5.5 + apache solr 1.3.0 + centos 5.3
I don't familiar with java at all, so sorry if it's dumb question.
Here is what i did:
placed solr.war in webapps folder
changed solr home to /etc/solr
copied contents of solr distribution example folder to /etc/solr

tomcat starting successfully and i even can access admin interface but
following error appears in catalina.out every 10 seconds:
SEVERE: Error deploying configuration descriptor
var#lib#tomcat5#webapps#solr.xml
Apr 14, 2009 1:30:14 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor etc#solr#.xml
Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor
var#lib#tomcat5#webapps#solr.xml
Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor etc#solr#.xml
Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor
var#lib#tomcat5#webapps#solr.xml
Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor etc#solr#.xml
Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor
var#lib#tomcat5#webapps#solr.xml
Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor etc#solr#.xml
Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor
var#lib#tomcat5#webapps#solr.xml
Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig
deployDescriptor
SEVERE: Error deploying configuration descriptor etc#solr#.xml


Googled about 3 hours.

tried to set allow write permissions for all to /etc, /etc/solr /var/
lib/tomcat5/webapps
tried to create empty file named solr.xml in /etc, /etc/solr
tried to copy solrconfig.xml to /etc/, /etc/solr


Re: Analyzers and stemmer

2009-04-14 Thread Grant Ingersoll
I would say a language is supported if there is a Tokenizer available  
for it.  Everything else after that is generally seen as an improvement.



On Apr 9, 2009, at 5:26 AM, revas wrote:


Hi ,

 With respect to language support in solr ,we have analyzers for some
languages and stemmers for certain langauges.Do we say that solr  
supports
this particular language only if we have both analyzer and stemmer  
for the

language or also for which we have analyzer but not stemmer

Regards
Sujatha


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Multi-language support

2009-04-14 Thread Grant Ingersoll


On Apr 9, 2009, at 7:09 AM, revas wrote:


Hi,

To reframe my earlier question

Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?

Some languages only have the stemmer from snowball but no analyzer?

Some have both.

Can we say then that solr supports all the above languages .Will  
search be

same across all the above cases?


I just responded to the earlier question, but it didn't contain this  
question.  No, I wouldn't say that search would be the same.  Stemmed  
vs. non-stemmed may result in different results, just as one stemmer  
implementation results will differ from a different stemming approach.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll

Are there changes occuring when it goes bad that maybe aren't committed?

On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:


But why would it work for a few days, then go bad and stay bad?

It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.

We do use autowarming.

wunder

On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com  
wrote:



On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
wunderw...@netflix.com wrote:

The JaroWinkler equals was broken, but I fixed that a month ago.

Query cache sounds possible, but those are cleared on a commit,
right?


Yes, but if you use autowarming, those items are regenerated and if
there is a problem with equals() then it could re-appear (the cache
items are correct, it's just the lookup that returns the wrong one).

-Yonik
http://www.lucidimagination.com




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once per day. It went bad at a different time.

wunder

On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote:

 Are there changes occuring when it goes bad that maybe aren't committed?
 
 On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
 
 But why would it work for a few days, then go bad and stay bad?
 
 It fails for every multi-term query, even those not in cache.
 I ran a test with more queries than the cache size.
 
 We do use autowarming.
 
 wunder
 
 On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com
 wrote:
 
 On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
 wunderw...@netflix.com wrote:
 The JaroWinkler equals was broken, but I fixed that a month ago.
 
 Query cache sounds possible, but those are cleared on a commit,
 right?
 
 Yes, but if you use autowarming, those items are regenerated and if
 there is a problem with equals() then it could re-appear (the cache
 items are correct, it's just the lookup that returns the wrong one).
 
 -Yonik
 http://www.lucidimagination.com
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 



Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Smiley, David W.
SolrJ would require some modification.  SolrJ internally uses Jakarta HTTP 
Client via Solr's CommonsHttpSolrServer class.  It would need to be ported to 
a different implementation of SolrServer (the base class), one that uses 
java.net.URL. I suggest JavaNetUrlHttpSolrServer.

~ David Smiley


On 4/14/09 1:13 PM, Glen Newton glen.new...@gmail.com wrote:

I was wondering if those more up on SolrJ internals could take a look
if there were any serious gotchas with the AppEngine's Java urlfetch
with respect to SolrJ.

http://code.google.com/appengine/docs/java/urlfetch/overview.html
The URL must use the standard ports for HTTP (80) and HTTPS (443).
The port is implied by the scheme, but may also be mentioned in the
URL as long as the port is standard for the scheme (https://...:443/).
An app cannot connect to an arbitrary port of a remote host, nor can
it use a non-standard port for a scheme.

This is an annoyance for those running Solr on non-80/443. To some,
this may be a fatal limitation.

There is a 1M upload/download limit, which would impact large adds to
the index and large results sets back from the index.
There are also other quotas:
http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits

Otherwise, my eyes see no other major issues. Others?

thanks,

Glen

--

-



How to manage real-time (presence) data in a large index?

2009-04-14 Thread Development Team
Hi everybody,
   I have a relatively large index (it will eventually contain ~4M
documents and be about 3G in size, I think) that indexes user data,
settings, and the like. The documents represent a community of users
whereupon a subset of them may be online at any time. Also, we want to
score our search results across searches that span the whole index by the
online (i.e. presence) status.
   Right now the list of online members is kept in a database table,
however we very often need to search on these users. The problem is, we're
using Solr for our searches and we don't know how to approach setting up a
search system for a large amount of highly volatile data.
   How do people typically go about this? Do they do one of the
following:
 1) Set up a second core and keep only index the online
members in there? (Then we could not score normal search results by online
status.)
 2) Index the online status in our regular solr index and not
worry about it? (If it's fast to update docs in a large index, then why not
maintain real-time data in the main index?)
 3) Just use a database for the presence data and forget about
using Solr for the presence-related searches?
   Is there anything in Solr that I should be looking into to help with
this problem? I'd appreciate any help.

Sincerely,

   Daryl.


Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
I see. So this is a show stopper for those wanting to use SolrJ with AppEngine.

Any chance this could be added as a Solr issue?

-glen

2009/4/14 Smiley, David W. dsmi...@mitre.org:
 SolrJ would require some modification.  SolrJ internally uses Jakarta HTTP
 Client via Solr’s “CommonsHttpSolrServer” class.  It would need to be ported
 to a different implementation of SolrServer (the base class), one that uses
 java.net.URL. I suggest “JavaNetUrlHttpSolrServer”.

 ~ David Smiley


 On 4/14/09 1:13 PM, Glen Newton glen.new...@gmail.com wrote:

 I was wondering if those more up on SolrJ internals could take a look
 if there were any serious gotchas with the AppEngine's Java urlfetch
 with respect to SolrJ.

 http://code.google.com/appengine/docs/java/urlfetch/overview.html
 The URL must use the standard ports for HTTP (80) and HTTPS (443).
 The port is implied by the scheme, but may also be mentioned in the
 URL as long as the port is standard for the scheme (https://...:443/).
 An app cannot connect to an arbitrary port of a remote host, nor can
 it use a non-standard port for a scheme.

 This is an annoyance for those running Solr on non-80/443. To some,
 this may be a fatal limitation.

 There is a 1M upload/download limit, which would impact large adds to
 the index and large results sets back from the index.
 There are also other quotas:
 http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits

 Otherwise, my eyes see no other major issues. Others?

 thanks,

 Glen

 --

 -





-- 

-


Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Shalin Shekhar Mangar
On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote:

 I see. So this is a show stopper for those wanting to use SolrJ with
 AppEngine.

 Any chance this could be added as a Solr issue?


Yes, commons-httpclient tries to use Socket directly. So it may not work.

It was mentioned here -
http://briccetti.blogspot.com/2009/04/my-first-scala-web-app-on-google-app.html

There is an issue I opened some time back which we could use -
https://issues.apache.org/jira/browse/SOLR-599

-- 
Regards,
Shalin Shekhar Mangar.


Distinct terms in facet field

2009-04-14 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
How could I get a count of distinct terms for a given query?  For example:
The Wiki page
http://wiki.apache.org/solr/SimpleFacetParameters
has a section Facet Fields with No Zeros
which shows the query:
http://localhost:8983/solr/select?q=ipodrows=0facet=truefacet.limit=-1facet.field=catfacet.mincount=1facet.field=inStock
and returns results where the inStock field has two facet counts (false is 3, 
and true is 1)

But what I would want to know is how many distinct values were found ( in this 
case it would be 2 / true and false ).  I realize I could count the number of 
terms returned, but if the set were large that would be non-performant.  Is 
there a better way?

Thanks,
Tim


Hierarchal Faceting Field Type

2009-04-14 Thread Nasseam Elkarra

Background:
Set up a system for hierarchal categories using the following scheme:
level one#
level one#level two#
level one#level two#level three#

Trying to find the right combination of field type and query to get  
the desired results. Saw some previous posts about hierarchal facets  
which helped in the generating the right query but having an issue  
using the built in text field which ignores our delimiter and the  
string field which prevents us from doing a start with search. Does  
anyone have any insight into the field declaration?


Any help is appreciated. Thank you.


Re: Search on all fields and know in which field was the match

2009-04-14 Thread Chris Hostetter


: With this structure i think (correct me if i am wrong) i cant search for all
: attachBody_* and know where the match was (attachBody_1, _2, _3, etc).

correct

: I really don't know if this is the best approach so any help would be
: appreciated.

one option is to index each attachemnt as it's own document *in addition* 
to indexing each email will all of hte attachment text in a single 
atachments field.  that way you can search for all emails where Bob is 
mentioned in an attachment -- but if you want to know which specific 
attaahments mention bob you can do that search as well.



-Hoss



Re: How to send a parsed Query to shards?

2009-04-14 Thread Chris Hostetter

: reference some large in-memory lookup tables.  After the search components
: get done processing the orignal query, the query may contain SpanNearQueries
: and DisjunctionMaxQueries.  I'd like to send that query to the shards, not
: the original query.  
: 
: I've come up with the following idea for doing this.  Would people please
: comment on this idea or suggest a better alternative?
: 
: * Subclass QueryComponent to base64 encode the serialized form of the query
: and send that in place of the original query.
: 
: * set the queryParser on the shard servers to a custom class that unencodes
: and deserializes the encoded query and returns it.

those are essentially the same idea  a query string is just a 
simple form of QUery serialization.  a COmponent on your master could 
modify the query string to be anything you want (base64 encoded native 
serialization, xml based serialization, json, etc...) as long as the 
QParser on the slave machines know how to make sense of it.



-Hoss



Re: Custom sort based on arbitrary order

2009-04-14 Thread Chris Hostetter

: custom order that is fairly simple: there is a list of venues and some of
: them are more relevant than others (there is no logic, it's arbitrary, it's
: not an alphabetic order), it'd be something like this:
: 
: Orange venue = 1
: Red venu = 2
: Blue venue = 3
: 
: So results where venue is orange should go first, then red and finally
: blue. 
: Could you advice on the easiest way to have this example working?

use your rules to add values to all the docs at index time ... then sort 
on that value (ie: for each doc you actually index the value of 1, 2, or 3 
in a field no one ever looks at, but you sort on it.)



-Hoss



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll
Is bad memory a possibility?  i.e. is it the same machine all the  
time?  Is there any recognizable pattern for when it happens?


-Grant (grasping at straws)


On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:


Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once per day. It went bad at a different time.

wunder

On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote:

Are there changes occuring when it goes bad that maybe aren't  
committed?


On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:


But why would it work for a few days, then go bad and stay bad?

It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.

We do use autowarming.

wunder

On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com
wrote:


On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
wunderw...@netflix.com wrote:

The JaroWinkler equals was broken, but I fixed that a month ago.

Query cache sounds possible, but those are cleared on a commit,
right?


Yes, but if you use autowarming, those items are regenerated and if
there is a problem with equals() then it could re-appear (the cache
items are correct, it's just the lookup that returns the wrong  
one).


-Yonik
http://www.lucidimagination.com




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: using NGramTokenizerFactory for partial matching

2009-04-14 Thread Chris Hostetter


: I want it to match lor lorem and lorem i. However I am finding it
: matches the first two but not the third - the white space is causing
: problems. Here are the relevant parts of my config: 
: 
: fieldType name=text_substring class=solr.TextField
: positionIncrementGap=100
: analyzer type=index
: tokenizer class=solr.NGramTokenizerFactory
: minGramSize=3 maxGramSize=15 /  

NGramTokenizer doesn't do anything special with whitespace -- but teh 
QueryParser does ... what does your query for lorem i look like?

if you're using the example query parser nad request handler configs then 
this won't work like you want...

   http://localhost:8963/select?q=lorem+i

...because the query parser will split on the whitespace.

try quoting your string, or using the FieldQParserPlugin.



-Hoss



Sort by distance from location?

2009-04-14 Thread Development Team
Hi everybody,
 My index has latitude/longitude values for locations. I am required to
do a search based on a set of criteria, and order the results based on how
far the lat/long location is to the current user's location. Currently we
are emulating such a search by adding criteria of ever-widening bounding
boxes, and the more of those boxes match the document, the higher the score
and thus the closer ones appear at the start of the results. The query looks
something like this (newlines between each search term):

+criteraOne:1
+criteriaTwo:true
+latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0]
(latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79])
(latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51])
(latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03])
[[...etc...about 10 times...]]

 Naturally this is quite slow (query is approximately 6x slower than
normal), and... I can't help but feel that there's a more elegant way of
sorting by distance.
 Does anybody know how to do this or have any suggestions?

Sincerely,

 Daryl.


Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
I already ruled out cosmic rays. It has happened on different
hardware and at different times of day, including low load.

The only thing associated with it is load from a new faceted
browse thing we turned on.

wunder

On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote:

 Is bad memory a possibility?  i.e. is it the same machine all the
 time?  Is there any recognizable pattern for when it happens?
 
 -Grant (grasping at straws)
 
 
 On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:
 
 Nope. This is a slave, so no indexing happens, just a sync. The
 sync happens once per day. It went bad at a different time.
 
 wunder
 
 On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 Are there changes occuring when it goes bad that maybe aren't
 committed?
 
 On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
 
 But why would it work for a few days, then go bad and stay bad?
 
 It fails for every multi-term query, even those not in cache.
 I ran a test with more queries than the cache size.
 
 We do use autowarming.
 
 wunder
 
 On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com
 wrote:
 
 On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
 wunderw...@netflix.com wrote:
 The JaroWinkler equals was broken, but I fixed that a month ago.
 
 Query cache sounds possible, but those are cleared on a commit,
 right?
 
 Yes, but if you use autowarming, those items are regenerated and if
 there is a problem with equals() then it could re-appear (the cache
 items are correct, it's just the lookup that returns the wrong
 one).
 
 -Yonik
 http://www.lucidimagination.com
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 



Re: Sort by distance from location?

2009-04-14 Thread Smiley, David W.
Have you tried LocalSolr?
http://www.gissearch.com/localsolr
(I haven't but looks cool)


On 4/14/09 5:31 PM, Development Team dev.and...@gmail.com wrote:

Hi everybody,
 My index has latitude/longitude values for locations. I am required to
do a search based on a set of criteria, and order the results based on how
far the lat/long location is to the current user's location. Currently we
are emulating such a search by adding criteria of ever-widening bounding
boxes, and the more of those boxes match the document, the higher the score
and thus the closer ones appear at the start of the results. The query looks
something like this (newlines between each search term):

+criteraOne:1
+criteriaTwo:true
+latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0]
(latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79])
(latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51])
(latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03])
[[...etc...about 10 times...]]

 Naturally this is quite slow (query is approximately 6x slower than
normal), and... I can't help but feel that there's a more elegant way of
sorting by distance.
 Does anybody know how to do this or have any suggestions?

Sincerely,

 Daryl.



Re: Sort by distance from location?

2009-04-14 Thread Development Team
Ah, good question:  Yes, we've tried it... and it was slower. To give some
avg times:
Regular non-distance Searches: 100ms
Our expanding-criteria solution:  600ms
LocalSolr:  800ms

(We also had problems with LocalSolr in that the results didn't seem to be
cached in Solr upon doing a search. So each page of results meant another
800ms.)

- Daryl.


On Tue, Apr 14, 2009 at 5:34 PM, Smiley, David W. dsmi...@mitre.org wrote:

  Have you tried LocalSolr?
 http://www.gissearch.com/localsolr
 (I haven’t but looks cool)



Re: Disable logging in SOLR

2009-04-14 Thread Bill Au
Have you tried setting logging level to OFF from Solr's admin GUI:
http://wiki.apache.org/solr/SolrAdminGUI

Bill

On Tue, Apr 14, 2009 at 9:56 AM, Kraus, Ralf | pixelhouse GmbH 
r...@pixelhouse.de wrote:

 Hi,

 is there a way to disable all logging output in SOLR ?
 I mean the output text like :

 INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0
 QTime=3736

 greets -Ralf-




Re: _val:ord(field) (from wiki LargeIndexes)

2009-04-14 Thread Chris Hostetter

: I see this interesting line in the wiki page LargeIndexes 
: http://wiki.apache.org/solr/LargeIndexes (sorting section towards the 
: bottom)
: 
: Using _val:ord(field) as a search term will sort the results without 
: incurring the memory cost.
: 
: I'd like to know what this means, but I'm having a bit of trouble 
: parsing it  What is _val:ord(field) exactly?  Does this just mean 

that's refering to using function queries with the _val_ hack that is 
supported by the LuceneQParserPlugin...
http://wiki.apache.org/solr/SolrQuerySyntax

...it *seems* to be suggesting that if you use a function query based on 
the ordinal value of a field, you won't need the same amount of memory as 
if you just sorted on that field ... but that is incorrect, so i removed 
that like from the page.  (for string fields, the same FieldCache is 
initialized either way, for non string fields following that advice could 
result in 2 or 3 times as much memory being needed for both the numeric 
FieldCache and the String FieldCache entries)


-Hoss



Re: More than one language in the same document

2009-04-14 Thread Chris Hostetter

:  A related question. What does 'copyField' actually do? Does it 'append'
:  content from the source field to the 'target' field? Or does it
:  replace/overwrite it? Thank you.
:  
:
: It appends the content of the source field to the target.

strictly speaking, it adds the content to the target field as if it were 
another multi-valued field value.



-Hoss



Re: Hierarchal Faceting Field Type

2009-04-14 Thread Koji Sekiguchi

Nasseam Elkarra wrote:

Background:
Set up a system for hierarchal categories using the following scheme:
level one#
level one#level two#
level one#level two#level three#

Trying to find the right combination of field type and query to get 
the desired results. Saw some previous posts about hierarchal facets 
which helped in the generating the right query but having an issue 
using the built in text field which ignores our delimiter and the 
string field which prevents us from doing a start with search. Does 
anyone have any insight into the field declaration?


Any help is appreciated. Thank you.



Out of need in my project, I'll get started to work for SOLR-64, 
expected any day.

I'm thinking introducing a field type for hierarchical facet.

Koji




Re: using multisearcher

2009-04-14 Thread Chris Hostetter

: As for the second part, I was thinking of trying to replace the standard
: SolrIndexSearcher with one that employs a MultiSearcher.  But I'm not very
: familiar with the workings of Solr, especially with respect to the caching
: that goes on.  I thought that maybe people who are more familiar with it might
: have some tips on how to go about it.  Or perhaps there are reasons that make
: this a bad idea. 

If your indexes are all local, then using a MultiReader would be simpler 
trying to shoehorn MultiSearcher type logic into SolrIndexSearcher.

https://issues.apache.org/jira/browse/SOLR-243


-Hoss



Re: Access HTTP headers from custom request handler

2009-04-14 Thread Chris Hostetter

: Solr cannot assume that the request would always come from http (think
: of EmbeddedSolrServer) .So it assumes that there are only parameters

exactly.

: Your best bet is to modify SolrDispatchFilter and readthe params and
: set them in the SolrRequest Object

SolrDispatchFilter is designed to be subclassed to make this easy by 
overriding the execute method...

  protected void execute( HttpServletRequest req, SolrRequestHandler handler, 
  SolrQueryRequest sreq, SolrQueryResponse rsp) {
sreq.getContext().put( HttpServletRequest, req );
super.execute( req, handler, sreq, rsp )
  }

-Hoss



Index Replication or Distributed Search ?

2009-04-14 Thread ramanathan

Hi,

Can someone provide a practical advice of how large a Solr search index can
be?  for a better performance for consumer facing media website?. 

Is it good or bad to think about Distributed Search and dividing index in
earlier stage of development?

Thanks
Ram
-- 
View this message in context: 
http://www.nabble.com/Index-Replication-or-Distributed-Search---tp23050013p23050013.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll
OK, I guess details on the new faceting stuff would be in order.   
Which faceting are using?  Are you sure that it never occurred before  
(i.e. it slipped under the radar)?


Obviously, the key is reproducibility here, but this has all the  
earmarks of some weird threading issue, it seems, at least IMO.



On Apr 14, 2009, at 5:32 PM, Walter Underwood wrote:


I already ruled out cosmic rays. It has happened on different
hardware and at different times of day, including low load.

The only thing associated with it is load from a new faceted
browse thing we turned on.

wunder

On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote:


Is bad memory a possibility?  i.e. is it the same machine all the
time?  Is there any recognizable pattern for when it happens?

-Grant (grasping at straws)


On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:


Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once per day. It went bad at a different time.

wunder

On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote:


Are there changes occuring when it goes bad that maybe aren't
committed?

On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:


But why would it work for a few days, then go bad and stay bad?

It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.

We do use autowarming.

wunder

On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com
wrote:


On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
wunderw...@netflix.com wrote:

The JaroWinkler equals was broken, but I fixed that a month ago.

Query cache sounds possible, but those are cleared on a commit,
right?


Yes, but if you use autowarming, those items are regenerated  
and if
there is a problem with equals() then it could re-appear (the  
cache

items are correct, it's just the lookup that returns the wrong
one).

-Yonik
http://www.lucidimagination.com




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: How to manage real-time (presence) data in a large index?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Apr 15, 2009 at 12:39 AM, Development Team dev.and...@gmail.com wrote:
 Hi everybody,
       I have a relatively large index (it will eventually contain ~4M
 documents and be about 3G in size, I think) that indexes user data,
 settings, and the like. The documents represent a community of users
 whereupon a subset of them may be online at any time. Also, we want to
 score our search results across searches that span the whole index by the
 online (i.e. presence) status.
       Right now the list of online members is kept in a database table,
 however we very often need to search on these users. The problem is, we're
 using Solr for our searches and we don't know how to approach setting up a
 search system for a large amount of highly volatile data.
       How do people typically go about this? Do they do one of the
 following:
             1) Set up a second core and keep only index the online
 members in there? (Then we could not score normal search results by online
 status.)
This will not work because creating an index is quite expensive
             2) Index the online status in our regular solr index and not
 worry about it? (If it's fast to update docs in a large index, then why not
 maintain real-time data in the main index?)
Do you wish to have the data almost realtime?. That means you will
have to commit too often. It may result in very poor performance

             3) Just use a database for the presence data and forget about
 using Solr for the presence-related searches?

If the no:of users is low enough to be held in a HashSet in memory,
you can think of implementing a special Field akin to
org.apache.solr.schema.ExternalFileField . But do not hope to make it
realtime. But try to make it close to realtime (say 1 min update of
the hashSet. means fetch the data from DB once in a minute).

       Is there anything in Solr that I should be looking into to help with
 this problem? I'd appreciate any help.

 Sincerely,

       Daryl.




-- 
--Noble Paul


Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess SOLR-599 can be easily fixed if we do not implement
Multipart-support (which is non-essential)
--Noble

On Wed, Apr 15, 2009 at 1:12 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote:

 I see. So this is a show stopper for those wanting to use SolrJ with
 AppEngine.

 Any chance this could be added as a Solr issue?


 Yes, commons-httpclient tries to use Socket directly. So it may not work.

 It was mentioned here -
 http://briccetti.blogspot.com/2009/04/my-first-scala-web-app-on-google-app.html

 There is an issue I opened some time back which we could use -
 https://issues.apache.org/jira/browse/SOLR-599

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul