Re: Solr 4.2.1 + Distribution scripts (rsync) Issue

2013-06-05 Thread Sandeep Mestry
Hi Hoss,

Thanks for your reply, Please find answers to your questions below.

*Well, for starters -- have you considered at least looking into using the java
based Replicationhandler instead of the rsync scripts?*
- There was an attempt to to implement java based replication but it was
very slow and so that option was discarded and instead rsync was used. This
was done couple of years ago and till Feb of this year, we were using Solr
1.4. I upgraded solr to 4.0 with rsync, however due to time and resource
constraint rsync alternative was not evaluated and it can't be done even
today - only in next release, we'll be doing solrcloud.

My setup looks like below - this was working correctly with Solr 1.4, Solr
4.0 versions.

1) Index Feeder applications feeds indexes to indexer boxes.
2) A cron job that runs every minute on indexer boxes (commiter), commits
the indexes (commit) and invokes snapshooter to create snapshot. rsync
daemon running on indexer boxes.
3) Another cron job runs on search boxes every minute, which pulls the
snapshot (using snappuller), installs it on search boxes (snapinstaller)
which also notifies search to open a new searcher (commit)

Additionally, there is a cron job that runs every morning at 4 am on
indexer boxes which optimises the index (optimize) and cleans the snapshots
until a day (snapcleaner).
This is as per http://wiki.apache.org/solr/SolrCollectionDistributionScripts

*Which config is this, your indexer or your searcher? (i'm assuming it's the
searcher since i don't see any postCommit commands to exec snapshooter but
i wanted to sanity check that wasn't a simple explanation for your problem)*
- Because of this set up, I do not have any post commit setup in
solrconfig.xml.
- This solrconfig.xml is used for both indexer and searcher boxes.

I can see that after my upgrade to Solr 4.2.1, all these scripts behave
normally just that I do not see the updates getting refreshed on search
boxes unless I restart.
*
*
*What exactly does your manual commit command look like?  *
- This is by using commit script under bin directory (commit -h localhost
-p 8983)
- I have also tried URL based commit as you had mentioned but no luck

*Are you doing this on the indexer box or the searcher boxes? *
- I executed manual commit on searcher boxes, the indexer boxes do show the
commit and updates correctly.

*what is the HTTP response from this comment? what do the logs show when
you do this?
*
- I have attached the logs, please note that I have enabled the
openSearcher for testing.

Thanks, please let me know if I'm missing something. I remembered people
not getting their deletes and the workaround was to add _version_ field in
schema, which I had done but no luck. I know it might be unrelated but I am
just trying all my options.

Thanks again,
Sandeep


On 5 June 2013 00:41, Chris Hostetter hossman_luc...@fucit.org wrote:


 : However, we haven't yet implemented SolrCloud and still relying on
 : distribution scripts - rsync, indexpuller mechanism.

 Well, for starters -- have you considered at least looking into using hte
 java based Replicationhandler instead of the rsync scripts?

 Script based replication has not been actively maintained since java
 replication was added back in Solr 1.4!

 : I see that the indexes are getting created on indexer boxes, snapshots
 : being created and then pulled across to search boxes. The snapshots are
 : getting installed on search boxes as well. There are no errors in the
 : scripts logs and this process works well.
 : However, when I check the update in solr console (on search boxes), I do
 : not see the updated result. The updates do not appear in search boxes
 even
 : after manual commit. Only after a *restart* of the search application
 : (deployed in tomcat) I can see the updated results.

 What exactly does your manual commit command look like?  Are you
 doing this on the indexer box or the searcher boxes?  what is the HTTP
 response from this comment? what do the logs show when you do this?

 It's possible that some internal changes in Solr relating to NRT
 improvements may have optimized away re-opening on commit if solr doesn't
 think the index has changed -- but i doubt it.  because I just tried a
 simple test using the 4.3.0 example where i manually simulated
 snapinstaller replacing hte index files with a newer index and issued
 http://localhost:8983/solr/update?commit=true; and solr loaded up that
 new index and started searching it -- so i suspect the devil is in the
 details of your setup.

 you're sure each of the snapshooter, snappuller, snapinstaller scripts are
 executing properly?

 : I have done minimal changes for the upgrade in solrconfig.xml and is
 pasted
 : below. Please can someone take a look and let me know what the issue is.
 : The same config was working fine on Solr 4.0 (as well as Solr 1.4.1).

 which config is this, your indexer or your searcher? (i'm assuming it's
 the searcher since i don't see any postCommit commands 

Re: Setting up Solr

2013-06-05 Thread Shawn Heisey
On 6/4/2013 11:48 PM, Aaron Greenspan wrote:
 I thought I'd document my process of getting set up with Solr 4.3.0 on a 
 Linux server in case it's of use to anyone. I'm a moderately experienced 
 Linux system administrator, so without passing judgment (at least for now), 
 let me just say that I found getting Solr to work to be extremely 
 difficult--more difficult than just about any other package I've ever dealt 
 with, including ones I've built from source.

Thank you for your feedback.  Solr has always had a high learning curve,
and you've pointed out a lot of places we can improve things.

We have a number of Jira issues that specifically deal with something
called Developer Curb Appeal.  I think it's pretty clear that we need
to tackle a bunch of things we could call Newcomer Curb Appeal.  I can
work on filing some issues, some of which will address code, some of
which will address the docs included with Solr and the wiki pages
referenced there.

I realize that the software is at version 4.3, but the UI isn't - it is
brand new.  The old UI in 3.x and earlier versions was a place to go for
information, but you couldn't actually DO anything that would make
changes.  Historically, this is the reason for the admin UI - making
test queries, watching statistics, and gathering information.  The
ability to make changes is very recent.

The UI you've seen first appeared in 4.0.0, released last October.  It
was a complete rewrite in an entirely new language.  The old one was
JSP, the new one is javascript.

On requiring a username/password: Solr doesn't include any security
mechanisms.  We leave that to other software written by people who do
security really well.  It can be handled by the servlet container, or a
proxy.

Solr should not be directly reachable by users.  The intended usage is
to have your website process user-entered text to turn it into a query
and make sure it's clean before sending it to your Solr server(s), which
should be only reachable from behind the firewall.

Even if you use the servlet container's security features to really lock
things down - block access to the admin UI, the update handler, and
anything else that might get you into trouble, if someone can get
directly to the query interface, it's relatively easy to send denial of
service queries.  Most attempts to detect and block DoS would also block
legitimate queries that just happen to be slow.

Solr is a java servlet.  Servlet containers have historically used XML
config files, so as a natural consequence, Solr uses XML config files.
XML does allow for very precise and multi-layered configurations, but it
can be very confusing.

Version 4.4 will take the first tentative steps towards moving away from
XML.  The central config is still XML, but the individual cores won't be:

http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29
http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

There is only one specific problem that I will attempt to address in
this reply.  At this point, any advice I might give is probably too
little too late.  If I'm wrong and you do want some additional specific
help, let me know.

When you duplicate collection1 to make a new core, it is enough to
simply duplicate the main directory and the conf subdirectory.  I am
aware as I write this that there is probably no documentation that
states this clearly.

Thanks,
Shawn



Re: Two instances of solr - the same datadir?

2013-06-05 Thread Peter Sturge
Hi,
We use this very same scenario to great effect - 2 instances using the same
dataDir with many cores - 1 is a writer (no caching), the other is a
searcher (lots of caching).
To get the searcher to see the index changes from the writer, you need the
searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
This will refresh the caches (including autowarming), [re]build the
relevant searchers etc. and make any index changes visible to the RO
instance.
Also, make sure to use lockTypenative/lockType in solrconfig.xml to
ensure the two instances don't try to commit at the same time.
There are several ways to trigger a commit:
Call commit() periodically within your own code.
Use autoCommit in solrconfig.xml.
Use an RPC/IPC mechanism between the 2 instance processes to tell the
searcher the index has changed, then call commit when called (more complex
coding, but good if the index changes on an ad-hoc basis).
Note, doing things this way isn't really suitable for an NRT environment.

HTH,
Peter



On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Replication is fine, I am going to use it, but I wanted it for instances
 *distributed* across several (physical) machines - but here I have one
 physical machine, it has many cores. I want to run 2 instances of solr
 because I think it has these benefits:

 1) I can give less RAM to the writer (4GB), and use more RAM for the
 searcher (28GB)
 2) I can deactivate warming for the writer and keep it for the searcher
 (this considerably speeds up indexing - each time we commit, the server is
 rebuilding a citation network of 80M edges)
 3) saving disk space and better OS caching (OS should be able to use more
 RAM for the caching, which should result in faster operations - the two
 processes are accessing the same index)

 Maybe I should just forget it and go with the replication, but it doesn't
 'feel right' IFF it is on the same physical machine. And Lucene
 specifically has a method for discovering changes and re-opening the index
 (DirectoryReader.openIfChanged)

 Am I not seeing something?

 roman



 On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
 jhell...@innoventsolutions.com wrote:

  Roman,
 
  Could you be more specific as to why replication doesn't meet your
  requirements?  It was geared explicitly for this purpose, including the
  automatic discovery of changes to the data on the index master.
 
  Jason
 
  On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
   OK, so I have verified the two instances can run alongside, sharing the
   same datadir
  
   All update handlers are unaccessible in the read-only master
  
   updateHandler class=solr.DirectUpdateHandler2
   enable=${solr.can.write:true}
  
   java -Dsolr.can.write=false .
  
   And I can reload the index manually:
  
   curl 
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
   
  
   But this is not an ideal solution; I'd like for the read-only server to
   discover index changes on its own. Any pointers?
  
   Thanks,
  
roman
  
  
   On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
  
   Hello,
  
   I need your expert advice. I am thinking about running two instances
 of
   solr that share the same datadirectory. The *reason* being: indexing
   instance is constantly building cache after every commit (we have a
 big
   cache) and this slows it down. But indexing doesn't need much RAM,
 only
  the
   search does (and server has lots of CPUs)
  
   So, it is like having two solr instances
  
   1. solr-indexing-master
   2. solr-read-only-master
  
   In the solrconfig.xml I can disable update components, It should be
  fine.
   However, I don't know how to 'trigger' index re-opening on (2) after
 the
   commit happens on (1).
  
   Ideally, the second instance could monitor the disk and re-open disk
  after
   new files appear there. Do I have to implement custom
  IndexReaderFactory?
   Or something else?
  
   Please note: I know about the replication, this usecase is IMHO
 slightly
   different - in fact, write-only-master (1) is also a replication
 master
  
   Googling turned out only this
   http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
  no
   pointers there.
  
   But If I am approaching the problem wrongly, please don't hesitate to
   're-educate' me :)
  
   Thanks!
  
roman
  
 
 



Indexing Heavy dataset

2013-06-05 Thread Raheel Hasan
Hi,

I am trying to index a heavy dataset with 1 particular field really too
heavy...

However, As I start, I get Memory warning and rollback (OutOfMemoryError).
So, I have learned that we can use -Xmx1024m option with java command to
start the solr and allocate more memory to the heap.

My question is, that since this could also become insufficient later, so it
the issue related to cacheing?

here is my cache block in solrconfig:

filterCache class=solr.FastLRUCache
 size=512
 initialSize=512
 autowarmCount=0/

queryResultCache class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/

documentCache class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/

I am thinking like maybe I need to turn of the cache for documentClass.
Anyone got a better idea? Or perhaps there is another issue here?

Just to let you know, until I added that very heavy db field for indexing,
everything was just fine...


-- 
Regards,
Raheel Hasan


Heap space problem with mlt query

2013-06-05 Thread Varsha Rani
Hi ,

I am having solr index of 80GB  with 1 million documents .Each document of
aprx. 500KB . I have a machine with 16GB ram.

I am running mlt query on 3-5 fields of theses document .

I am getting solr out of memory problem . 

Exception in thread main java.lang.OutOfMemoryError: Java heap space

My Solr config is :

  ramBufferSizeMB128/ramBufferSizeMB
maxMergeDocs100/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout


I also checked with ramBuffer size of 256MB.

Please provide me suggestion regarding this.

Thanks
Varsha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Heap space problem with mlt query

2013-06-05 Thread Raheel Hasan
and I just asked a similar question just 1 sec ago


On Wed, Jun 5, 2013 at 2:07 PM, Varsha Rani varsha.ya...@orkash.com wrote:

 Hi ,

 I am having solr index of 80GB  with 1 million documents .Each document of
 aprx. 500KB . I have a machine with 16GB ram.

 I am running mlt query on 3-5 fields of theses document .

 I am getting solr out of memory problem .

 Exception in thread main java.lang.OutOfMemoryError: Java heap space

 My Solr config is :

   ramBufferSizeMB128/ramBufferSizeMB
 maxMergeDocs100/maxMergeDocs
 maxFieldLength1/maxFieldLength
 writeLockTimeout1000/writeLockTimeout
 commitLockTimeout1/commitLockTimeout


 I also checked with ramBuffer size of 256MB.

 Please provide me suggestion regarding this.

 Thanks
 Varsha



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Raheel Hasan


Re: Heap space problem with mlt query

2013-06-05 Thread Yago Riveiro
Varsha, 

Unless I'm mistaken, the ramBufferSizeMB param is used to do buffering of 
document before write them to disk.

Can you post the cache config that you have in the solrconfig.xml, what version 
are you using? 

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, June 5, 2013 at 10:09 AM, Raheel Hasan wrote:

 and I just asked a similar question just 1 sec ago
 
 
 On Wed, Jun 5, 2013 at 2:07 PM, Varsha Rani varsha.ya...@orkash.com 
 (mailto:varsha.ya...@orkash.com) wrote:
 
  Hi ,
  
  I am having solr index of 80GB with 1 million documents .Each document of
  aprx. 500KB . I have a machine with 16GB ram.
  
  I am running mlt query on 3-5 fields of theses document .
  
  I am getting solr out of memory problem .
  
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
  
  My Solr config is :
  
  ramBufferSizeMB128/ramBufferSizeMB
  maxMergeDocs100/maxMergeDocs
  maxFieldLength1/maxFieldLength
  writeLockTimeout1000/writeLockTimeout
  commitLockTimeout1/commitLockTimeout
  
  
  I also checked with ramBuffer size of 256MB.
  
  Please provide me suggestion regarding this.
  
  Thanks
  Varsha
  
  
  
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html
  Sent from the Solr - User mailing list archive at Nabble.com 
  (http://Nabble.com).
  
 
 
 
 
 -- 
 Regards,
 Raheel Hasan
 
 




Re: Heap space problem with mlt query

2013-06-05 Thread Varsha Rani
Hi yriveiro,

I am using Solr version3.6.

My cache config is below :
  filterCache
   class=solr.FastLRUCache
size=131072
   initialSize=4096
  autowarmCount=2048
   cleanupThread=true/

   queryResultCache
class=solr.FastLRUCache
size=131072
   initialSize=4096
   autowarmCount=2048
  cleanupThread=true/

   documentCache
   class=solr.FastLRUCache
   size=131072
   initialSize=4096
  autowarmCount=2048
  cleanupThread=true/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068282.html
Sent from the Solr - User mailing list archive at Nabble.com.


different Solr Logging for CONSOLE and FILE

2013-06-05 Thread Raheel Hasan
Hi,

I have a small question about solr logging.

In resourceslog4j.properties, we have

*log4j.rootLogger=INFO, file, CONSOLE*

However, what I want is:
*log4j.rootLogger=INFO, file
*
and
*log4j.rootLogger=WARN, CONSOLE*
(both simultaneously).

Is it possible?

-- 
Regards,
Raheel Hasan


Re: Heap space problem with mlt query

2013-06-05 Thread Yago Riveiro
Varsha,  

How is the size of your jvm heap?  

Other question is the document cache. The documentCache does cache of document 
objects fetched from the disk 
(http://wiki.apache.org/solr/SolrCaching#documentCache), if each document has 
500KB aprx. and you configure a cache of 131072 size, you are caching 131072 * 
(size document object), this can be a lot of ram …, Try decrease the 
documentCache size.  

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, June 5, 2013 at 10:28 AM, Varsha Rani wrote:

 Hi yriveiro,
  
 I am using Solr version3.6.
  
 My cache config is below :
 filterCache
 class=solr.FastLRUCache
 size=131072
 initialSize=4096
 autowarmCount=2048
 cleanupThread=true/
  
 queryResultCache
 class=solr.FastLRUCache
 size=131072
 initialSize=4096
 autowarmCount=2048
 cleanupThread=true/
  
 documentCache
 class=solr.FastLRUCache
 size=131072
 initialSize=4096
 autowarmCount=2048
 cleanupThread=true/
  
  
  
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068282.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).
  
  




Re: different Solr Logging for CONSOLE and FILE

2013-06-05 Thread Bernd Fehling


Am 05.06.2013 11:28, schrieb Raheel Hasan:
 Hi,
 
 I have a small question about solr logging.
 
 In resourceslog4j.properties, we have
 
 *log4j.rootLogger=INFO, file, CONSOLE*
 
 However, what I want is:
 *log4j.rootLogger=INFO, file
 *
 and
 *log4j.rootLogger=WARN, CONSOLE*
 (both simultaneously).
 
 Is it possible?
 

You can use:

log4j.rootLogger=INFO, file, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=WARN



Re: different Solr Logging for CONSOLE and FILE

2013-06-05 Thread Raheel Hasan
OK thanks... it works... :D

Also I found that we could put both of them and it will also work:
log4j.rootLogger=INFO, file
log4j.rootLogger=WARN, CONSOLE




On Wed, Jun 5, 2013 at 2:42 PM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:



 Am 05.06.2013 11:28, schrieb Raheel Hasan:
  Hi,
 
  I have a small question about solr logging.
 
  In resourceslog4j.properties, we have
 
  *log4j.rootLogger=INFO, file, CONSOLE*
 
  However, what I want is:
  *log4j.rootLogger=INFO, file
  *
  and
  *log4j.rootLogger=WARN, CONSOLE*
  (both simultaneously).
 
  Is it possible?
 

 You can use:

 log4j.rootLogger=INFO, file, CONSOLE

 log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
 log4j.appender.CONSOLE.Threshold=WARN




-- 
Regards,
Raheel Hasan


Files included from the default SolrConfig

2013-06-05 Thread Raheel Hasan
Hi,

I am trying to optimize solr.

The default solrConfig that comes with solrcollection1 has a lot of libs
included I dont really need. Perhaps if someone could help we identifying
the purpose. (I only import from DIH):

Please tell me whats in these:
contrib/extraction/lib
solr-cell-

contrib/clustering/lib
solr-clustering-

contrib/langid/lib/
solr-langid


-- 
Regards,
Raheel Hasan


Sole instance state is down in cloud mode

2013-06-05 Thread sathish_ix
Hi,

When i start a core in solr-cloud im getting below message in log

I have setup zookeeper separately and uploaded the config files.
When i start the solr instance in cloud mode, state is down.


INFO: Update state numShards=null message={
  operation:state,
  numShards:null,
  shard:shard1,
  roles:null,
  *state:down,*
  core:core1,
  collection:core1,
  node_name:x:9980_solr,
  base_url:http://x:9980/solr}
Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)


When i hit the url , i am getting left pane of the solr admin and righ side
its keep on loading, any help ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html
Sent from the Solr - User mailing list archive at Nabble.com.


data-import problem

2013-06-05 Thread Stavros Delisavas

Hello Solr-Friends,

I have a problem with my current solr configuration. I want to import 
two tables into solr. I got it to work for the first table, but the 
second table doesn't get imported (no errormessage, 0 rows skipped).
I have two tables called name and title and i want to load their fields 
called id, name and id title (two id colums that have nothing to do with 
each other)


This is in my data-config.xml:

document
entity name=name query=SELECT id, name FROM name/entity
/document
document
entity name=title query=SELECT id AS titleid, title FROM 
name/entity

/document

and this is in my schema.xml:

field name=id type=string indexed=true stored=true /
field name=id type=string indexed=true stored=true /
field name=id type=string indexed=true stored=true /




data-import problem

2013-06-05 Thread Stavros Delisavas

Hello Solr-Friends,

I have a problem with my current solr configuration. I want to import 
two tables into solr. I got it to work for the first table, but the 
second table doesn't get imported (no errormessage, 0 rows skipped).
I have two tables called name and title and i want to load their fields 
called id, name and id title (two id colums that have nothing to do with 
each other)


This is in my data-config.xml:

document
entity name=name query=SELECT id, name FROM name/entity
/document
document
entity name=title query=SELECT id AS titleid, title FROM 
name/entity

/document

and this is in my schema.xml:

field name=id type=string indexed=true stored=true /
field name=name type=text_general indexed=true stored=true /
field name=titleid type=string indexed=true stored=true /
field name=title type=text_general indexed=true stored=true /

dynamicField name=* type=ignored multiValued=true /

/fields

uniqueKeyid/uniqueKey

/schema


I chose that unique key only because solr asked for it.
In my SolrAdmin Scheme-Browser I can see three fields id, name and 
title, but titleid is missing and title itself is empy with no entries. 
I don't know how to get it work to index two seperate lists.


I hope someone can help, thank you!


Re: Query Elevation Component

2013-06-05 Thread jmlucjav
davers wrote
 I want to elevate certain documents differently depending a a certain fq
 parameter in the request. I've read of somebody coding solr to do this but
 no code was shared. Where would I start looking to implement this feature
 myself?

Davers, 

I am also looking into this feature. Care to tell where did you see this
discussed? I could not find anything. Also, did you manage to implement this
somehow?

thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068308.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Heap space problem with mlt query

2013-06-05 Thread Varsha Rani
Hi yriveiro, 


When i was using document cache size= 131072, i got exception in 5000-6000
mlt queries.

But once i done document cache size=16384, i got same problem in 1500-2000
mlt queries.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068313.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Files included from the default SolrConfig

2013-06-05 Thread Jack Krupansky
1. SolrCell (ExtractingRequestHandler) - extract and index content from rich 
documents, such as PDF, Office docs, HTML (uses Tika)

2. Clustering - for result clustering.
3. Language identification (two update processors) - analyzes text of fields 
to determine language code.


None of those is mandatory, which is why they have separate libs.

-- Jack Krupansky

-Original Message- 
From: Raheel Hasan

Sent: Wednesday, June 05, 2013 5:57 AM
To: solr-user@lucene.apache.org
Subject: Files included from the default SolrConfig

Hi,

I am trying to optimize solr.

The default solrConfig that comes with solrcollection1 has a lot of libs
included I dont really need. Perhaps if someone could help we identifying
the purpose. (I only import from DIH):

Please tell me whats in these:
contrib/extraction/lib
solr-cell-

contrib/clustering/lib
solr-clustering-

contrib/langid/lib/
solr-langid


--
Regards,
Raheel Hasan 



Receiving unexpected Faceting results.

2013-06-05 Thread Dotan Cohen
Consider the following Solr query:
select?q=*:*fq=tags:dotan-*facet=truefacet.field=tagsrows=0

The 'tags' field is a multivalue field. I would expect the previous
query to return only tags that begin with the string 'dotan-' such as:
dotan-home
dotan-work
...but not strings which do not begin with (or even contain) the
string in question.

However, I am getting these results:
lst name=discoapi_tags
int name=dotan-home14/int
int name=dotan-work13/int
int name=beer0/int
int name=beatles0/int
/lst

It _may_ be that the 'beer' and 'beatles' tags were once attached to
the same documents as are attached the 'dotan-home' and/or
'dotan-work'. I've done a bit of experimenting on this Solr install,
so I cannot be sure. However, considering that they are in fact 0
results for those two, I would not expect them to show up at all, even
if they ever were attached to (i.e. once a value in the multiValue
field) any of the results that match the filter query.

So, the questions are:
1) How can I check if ever the multiValue fields for a particular
document (given its uniqueKey id) ever contains a specific value.
Alternatively, how can I see all the values that the document ever had
for the field. I don't expect this to actually be possible, but I ask
if it is, i.e. by examining certain aspects of the Solr index with a
text editor.

2) If those spurious results are appearing does that mean necessarily
that those values for the multivalued field were in fact once in the
multivalued field for documents matching the filter query? Thus, the
answer to the previous question would be to simply run a query for the
id of the document in question, and facet on the multivalued field
with a large limit.

3) How to have Solr return only those faceting values for the field
that in fact begin with 'dotan-', even if a document has other tags
such as 'beatles'?

4) How to have Solr return only those faceting values which are larger than 0?

Thank you!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Receiving unexpected Faceting results.

2013-06-05 Thread Raymond Wiker
3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this
particular case will not work if the field you're facetting on is tokenised
(with - being used as a taken separator).

4) Use the parameter facet.mincount - looks like you want to set it to 1,
instead of the default which is 0.


Re: Receiving unexpected Faceting results.

2013-06-05 Thread Brendan Grainger
Hi Dotan,

I think all you need to do is add:

facet.mincount=1

i.e.

select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
rows=0facet.mincount=1

Note that you can do it per field as well:

select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
rows=0f.tags.facet.mincount=1

http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount




On Wed, Jun 5, 2013 at 8:27 AM, Dotan Cohen dotanco...@gmail.com wrote:

 Consider the following Solr query:
 select?q=*:*fq=tags:dotan-*facet=truefacet.field=tagsrows=0

 The 'tags' field is a multivalue field. I would expect the previous
 query to return only tags that begin with the string 'dotan-' such as:
 dotan-home
 dotan-work
 ...but not strings which do not begin with (or even contain) the
 string in question.

 However, I am getting these results:
 lst name=discoapi_tags
 int name=dotan-home14/int
 int name=dotan-work13/int
 int name=beer0/int
 int name=beatles0/int
 /lst

 It _may_ be that the 'beer' and 'beatles' tags were once attached to
 the same documents as are attached the 'dotan-home' and/or
 'dotan-work'. I've done a bit of experimenting on this Solr install,
 so I cannot be sure. However, considering that they are in fact 0
 results for those two, I would not expect them to show up at all, even
 if they ever were attached to (i.e. once a value in the multiValue
 field) any of the results that match the filter query.

 So, the questions are:
 1) How can I check if ever the multiValue fields for a particular
 document (given its uniqueKey id) ever contains a specific value.
 Alternatively, how can I see all the values that the document ever had
 for the field. I don't expect this to actually be possible, but I ask
 if it is, i.e. by examining certain aspects of the Solr index with a
 text editor.

 2) If those spurious results are appearing does that mean necessarily
 that those values for the multivalued field were in fact once in the
 multivalued field for documents matching the filter query? Thus, the
 answer to the previous question would be to simply run a query for the
 id of the document in question, and facet on the multivalued field
 with a large limit.

 3) How to have Solr return only those faceting values for the field
 that in fact begin with 'dotan-', even if a document has other tags
 such as 'beatles'?

 4) How to have Solr return only those faceting values which are larger
 than 0?

 Thank you!

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com




-- 
Brendan Grainger
www.kuripai.com


Search for misspelled words in corpus

2013-06-05 Thread కామేశ్వర రావు భైరవభట్ల
Hi,

I have a problem where our text corpus on which we need to do search
contains many misspelled words. Same word could also be misspelled in
several different ways. It could also have documents that have correct
spellings However, the search term that we give in query would always be
correct spelling. Now when we search on a term, we would like to get all
the documents that contain both correct and misspelled forms of the search
term.
We tried fuzzy search, but it doesn't work as per our expectations. It
returns any close match, not specifically misspelled words. For example, if
I'm searching for a word like fight, I would like to return the documents
that have words like figth and feight, not documents with words like
sight and light.
Is there any suggested approach for doing this?

regards,
Kamesh


Re: Receiving unexpected Faceting results.

2013-06-05 Thread Dotan Cohen
On Wed, Jun 5, 2013 at 3:38 PM, Raymond Wiker rwi...@gmail.com wrote:
 3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this
 particular case will not work if the field you're facetting on is tokenised
 (with - being used as a taken separator).

 4) Use the parameter facet.mincount - looks like you want to set it to 1,
 instead of the default which is 0.

Perfect, thank you Raymond!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Heap space problem with mlt query

2013-06-05 Thread adityab
Did you try reducing filter and query cache. They are fairly large too unless
you really need them to be cached for your use cache.
Do you have that many distinct filter queries hitting solr for the size you
have defined for filterCache?
Are you doing any sorting? as this will chew up a lot of memory because of
lucene's internal field cache




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Receiving unexpected Faceting results.

2013-06-05 Thread Dotan Cohen
On Wed, Jun 5, 2013 at 3:41 PM, Brendan Grainger
brendan.grain...@gmail.com wrote:
 Hi Dotan,

 I think all you need to do is add:

 facet.mincount=1

 i.e.

 select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
 rows=0facet.mincount=1

 Note that you can do it per field as well:

 select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
 rows=0f.tags.facet.mincount=1

 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount


Thanks, Brendan. I will review the available Facet Parameters, which I
really should have thought to do before posting as it is already
bookmarked!


Re: different Solr Logging for CONSOLE and FILE

2013-06-05 Thread Shawn Heisey
On 6/5/2013 3:46 AM, Raheel Hasan wrote:
 OK thanks... it works... :D
 
 Also I found that we could put both of them and it will also work:
 log4j.rootLogger=INFO, file
 log4j.rootLogger=WARN, CONSOLE

If this completely separates INFO from WARN and ERROR, then you would
want to rethink and probably use what Bernd suggested.  I don't know if
this is what happens.

It's easier to understand a logfile if you can see errors, warnings, and
informational messages together in context.  If the more severe messages
are only logged to CONSOLE, then you lose them.  Even if you then
redirect the console to a file outside of Solr, you would need to try
and piece the full log together based on timestamps from two files, and
sometimes things happen too fast for that, even if you're logging with
millisecond accuracy.

Thanks,
Shawn



Re: Indexing Heavy dataset

2013-06-05 Thread Shawn Heisey
On 6/5/2013 3:08 AM, Raheel Hasan wrote:
 Hi,
 
 I am trying to index a heavy dataset with 1 particular field really too
 heavy...
 
 However, As I start, I get Memory warning and rollback (OutOfMemoryError).
 So, I have learned that we can use -Xmx1024m option with java command to
 start the solr and allocate more memory to the heap.
 
 My question is, that since this could also become insufficient later, so it
 the issue related to cacheing?
 
 here is my cache block in solrconfig:
 
 filterCache class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/
 
 queryResultCache class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/
 
 documentCache class=solr.LRUCache
size=512
initialSize=512
autowarmCount=0/
 
 I am thinking like maybe I need to turn of the cache for documentClass.
 Anyone got a better idea? Or perhaps there is another issue here?

Exactly how big is this field?  Do you need this giant field returned
with your results, or is it just there for searching?

Caches of size 512, especially with autowarm disabled, are probably not
a major cause for concern, unless the big field is big enough so that
512 of them is really really huge.  If that's the case, I would reduce
the size of your documentCache, not turn it off.

The value of ramBufferSizeMB elsewhere in your config is more likely to
affect how much RAM gets used during indexing.  The default for this
field as of Solr 4.1.0 is 100.  Most people can reduce this value.

I'm writing a reply to another thread where you are participating, with
info that will likely be useful for you too.  Look for that.

Thanks,
Shawn



Re: Heap space problem with mlt query

2013-06-05 Thread Shawn Heisey
On 6/5/2013 3:07 AM, Varsha Rani wrote:
 Hi ,
 
 I am having solr index of 80GB  with 1 million documents .Each document of
 aprx. 500KB . I have a machine with 16GB ram.
 
 I am running mlt query on 3-5 fields of theses document .
 
 I am getting solr out of memory problem .

This wiki page has relevant info for your situation.  As you are reading
it, it might not seem relevant, but I'll try to point things out.

http://wiki.apache.org/solr/SolrPerformanceProblems

The memory that is getting exhausted here is heap memory.  You probably
need a larger java heap.  The settings that your other replies have
talked about do affect how much heap gets used, but they do not increase
it.  That is a java commandline option that must be applied to the
command that starts the servlet container which runs Solr.

For 500KB documents, you probably want a ramBufferSizeMB of 64-128.  You
probably want to greatly reduce the size of your documentCache, and
possibly the other caches as well.  Your autowarm counts are very high -
you'll want to reduce those so that your cache warming time is low when
you commit and open a new searcher.

With an index size of 80GB, you'll probably need a heap size of 8GB.
Depending on how you use Solr, you might need more.  If you read the
wiki page carefully, you'll also realize that in addition to this heap
memory, you need additional memory to cache your index - between 40 and
80GB of additional memory.  The absolute minimum server size you want
here is 48GB, and 128GB would be *much* better.  Reducing your index
size might be a critical step.  Do you need to store all fields?  Most
people don't need all the fields in order to display the top N search
results.  When showing a detail page to the user, most people can get
the bulk of their data from another data store by using an ID value
retrieved from Solr.

The performance problems that come from your disk cache being too small
can carry over into OutOfMemory exceptions that you wouldn't otherwise
get, because it makes indexing and queries take too long.  When they
take too long, you can end up doing too many of them at the same time,
chewing up additional memory.

Thanks,
Shawn



Re: Setting up Solr

2013-06-05 Thread Alexandre Rafalovitch
On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan
aar...@thinkcomputer.com wrote:
 I say this not because I enjoy starting flame wars or because I have the time 
 to participate in them--I don't. I realize that there's a long history to 
 Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't 
 change the way it works, and many users will be just like me. So just know 
 that I'd just like to see Solr improve--frankly, I need it to--and if these 
 issues were not already glaringly obvious, they should be now.

This!

Seriously, I think this feedback is valuable and I have recently gone
through a similar experience. This is why I have written a book
specifically targeting people who basically got their first (example)
collection running and are now stuck on how to get the second (first
'real one') do what they want. The book is available for pre-orders
at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in
a couple more days) and a bunch of sample configurations that go with
it are at: https://github.com/arafalov/solr-indexing-book

On specific points, I do agree that we need to make Admin WebUI to
have the first/only core pre-selected. If nobody has created a JIRA
for this yet, I will. And, I think, perhaps we need absolutely minimal
solr configuration shipping in Solr distribution. With a single '*'
field and so on.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: Indexing Heavy dataset

2013-06-05 Thread Raheel Hasan
ok thanks for the reply The field having values like 60kb each

Furthermore, I have realized that the issue is with MySQL as its not
processing this table when a where is applied

Secondly, I have turned this field to *stored=false* and now the *select/
* is fast working again



On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/5/2013 3:08 AM, Raheel Hasan wrote:
  Hi,
 
  I am trying to index a heavy dataset with 1 particular field really too
  heavy...
 
  However, As I start, I get Memory warning and rollback
 (OutOfMemoryError).
  So, I have learned that we can use -Xmx1024m option with java command to
  start the solr and allocate more memory to the heap.
 
  My question is, that since this could also become insufficient later, so
 it
  the issue related to cacheing?
 
  here is my cache block in solrconfig:
 
  filterCache class=solr.FastLRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  queryResultCache class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  documentCache class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/
 
  I am thinking like maybe I need to turn of the cache for documentClass.
  Anyone got a better idea? Or perhaps there is another issue here?

 Exactly how big is this field?  Do you need this giant field returned
 with your results, or is it just there for searching?

 Caches of size 512, especially with autowarm disabled, are probably not
 a major cause for concern, unless the big field is big enough so that
 512 of them is really really huge.  If that's the case, I would reduce
 the size of your documentCache, not turn it off.

 The value of ramBufferSizeMB elsewhere in your config is more likely to
 affect how much RAM gets used during indexing.  The default for this
 field as of Solr 4.1.0 is 100.  Most people can reduce this value.

 I'm writing a reply to another thread where you are participating, with
 info that will likely be useful for you too.  Look for that.

 Thanks,
 Shawn




-- 
Regards,
Raheel Hasan


Re: Indexing Heavy dataset

2013-06-05 Thread Raheel Hasan
some values in the field are up to a 1M as well


On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 ok thanks for the reply The field having values like 60kb each

 Furthermore, I have realized that the issue is with MySQL as its not
 processing this table when a where is applied

 Secondly, I have turned this field to *stored=false* and now the *
 select/* is fast working again



 On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/5/2013 3:08 AM, Raheel Hasan wrote:
  Hi,
 
  I am trying to index a heavy dataset with 1 particular field really too
  heavy...
 
  However, As I start, I get Memory warning and rollback
 (OutOfMemoryError).
  So, I have learned that we can use -Xmx1024m option with java command to
  start the solr and allocate more memory to the heap.
 
  My question is, that since this could also become insufficient later,
 so it
  the issue related to cacheing?
 
  here is my cache block in solrconfig:
 
  filterCache class=solr.FastLRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  queryResultCache class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  documentCache class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/
 
  I am thinking like maybe I need to turn of the cache for
 documentClass.
  Anyone got a better idea? Or perhaps there is another issue here?

 Exactly how big is this field?  Do you need this giant field returned
 with your results, or is it just there for searching?

 Caches of size 512, especially with autowarm disabled, are probably not
 a major cause for concern, unless the big field is big enough so that
 512 of them is really really huge.  If that's the case, I would reduce
 the size of your documentCache, not turn it off.

 The value of ramBufferSizeMB elsewhere in your config is more likely to
 affect how much RAM gets used during indexing.  The default for this
 field as of Solr 4.1.0 is 100.  Most people can reduce this value.

 I'm writing a reply to another thread where you are participating, with
 info that will likely be useful for you too.  Look for that.

 Thanks,
 Shawn




 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan


Re: Setting up Solr

2013-06-05 Thread Yago Riveiro
If we see the UI of other cloud base softwares like couchbase or riak, they are 
more intuitive than solr's UI. Of course the UI is brand new and need a lot of 
improvements. Per example the possibility of select a existing config from 
zookeeper when you are using the wizard to create a collection. Even more, a 
section to upload a config from de UI without use the cryptical zkClient script.

Regards,

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, June 5, 2013 at 3:21 PM, Alexandre Rafalovitch wrote:

 On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan
 aar...@thinkcomputer.com (mailto:aar...@thinkcomputer.com) wrote:
  I say this not because I enjoy starting flame wars or because I have the 
  time to participate in them--I don't. I realize that there's a long history 
  to Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't 
  change the way it works, and many users will be just like me. So just know 
  that I'd just like to see Solr improve--frankly, I need it to--and if these 
  issues were not already glaringly obvious, they should be now.
 
 
 This!
 
 Seriously, I think this feedback is valuable and I have recently gone
 through a similar experience. This is why I have written a book
 specifically targeting people who basically got their first (example)
 collection running and are now stuck on how to get the second (first
 'real one') do what they want. The book is available for pre-orders
 at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in
 a couple more days) and a bunch of sample configurations that go with
 it are at: https://github.com/arafalov/solr-indexing-book
 
 On specific points, I do agree that we need to make Admin WebUI to
 have the first/only core pre-selected. If nobody has created a JIRA
 for this yet, I will. And, I think, perhaps we need absolutely minimal
 solr configuration shipping in Solr distribution. With a single '*'
 field and so on.
 
 Regards,
 Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
 book)
 
 




copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Robert Krüger
I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert


data-import problem

2013-06-05 Thread Stavros Delisavas

Hello Solr-Friends,

I have a problem with my current solr configuration. I want to import 
two tables into solr. I got it to work for the first table, but the 
second table doesn't get imported (no errormessage, 0 rows skipped).
I have two tables called name and title and i want to load their fields 
called id, name and id title (two id colums that have nothing to do with 
each other)


This is in my data-config.xml:

document
entity name=name query=SELECT id, name FROM name/entity
/document
document
entity name=title query=SELECT id AS titleid, title FROM 
name/entity

/document

and this is in my schema.xml:

field name=id type=string indexed=true stored=true /
field name=name type=text_general indexed=true stored=true /
field name=titleid type=string indexed=true stored=true /
field name=title type=text_general indexed=true stored=true /

dynamicField name=* type=ignored multiValued=true /

/fields

uniqueKeyid/uniqueKey

/schema


I chose that unique key only because solr asked for it.
In my SolrAdmin Scheme-Browser I can see three fields id, name and 
title, but titleid is missing and title itself is empy with no entries. 
I don't know how to get it work to index two seperate lists.


I hope someone can help, thank you!

PS: I am sorry if this mail reached you twice. I sent it the first time 
when I was not registered yet and don't know if the mail was received. 
Sending now again after registration to mailing list.


Re: copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Alexandre Rafalovitch
I think the suggestion I have seen is that copyField should be
index-only and - therefore - will not be returned. It is primarily
there to make searching easier by aggregating fields or to provide
alternative analyzer pipeline.

Can you make your copyField destination not stored?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Jun 5, 2013 at 10:37 AM, Robert Krüger krue...@lesspain.de wrote:
 I have the exact same problem as the guy here:

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

 AFAICS he did not get an answer. Is this a known issue? What can I do
 other than doing what copyField should do in my application?

 I am using solr 4.0.0.

 Thanks,

 Robert


Re: Setting up Solr

2013-06-05 Thread Shawn Heisey
 We have a number of Jira issues that specifically deal with something
 called Developer Curb Appeal.  I think it's pretty clear that we need
 to tackle a bunch of things we could call Newcomer Curb Appeal.  I can
 work on filing some issues, some of which will address code, some of
 which will address the docs included with Solr and the wiki pages
 referenced there.

I have filed the master issue.  I will file some linked issues over the
next few days.  All ideas and patches welcome.

https://issues.apache.org/jira/browse/SOLR-4901

The wiki is our primary documentation.  Updates are appreciated.  In
order to edit the wiki, you must create an account and ask on this
mailing list for it to be added to the contributors group.

Thanks,
Shawn



Re: Solr - ORM like layer

2013-06-05 Thread Tuğcem Oral
Sorry for opening a new thread. As i sent first message w/o subscribing the
mailing list, i couldn't find a possible solution to reply original thread.
The messaging stream is attached below.


Actually the requirement came up from such a scenario: We collect some xml
documents from some external resources and need to parse those xml docs and
index some part of them. But those xml docs have different roots and
attributes in. So we generate all possible classes for each root type via
JAXB. As each document have different informative values, each of them
should be indexed into seperate solr instances. The module we wrote simply
generates a solr schema template with respect to all aggregative objects in
root object (recursively) except annotation @SolrIndexIgnore owners. And
also we are able to generate a SolrDocument from given object and index it
to specified solr instance. While retrieving results from solr, we generate
a list of this object's, from SolrDocument instances. Hibernate
configuration for Lucene indexing is a bit different i thought, as we are
able to generate solr-schema from given object.

Best.





-Original Message- From: Tuğcem Oral
Sent: Tuesday, June 04, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: Solr - ORM like layer

Hi folks,

I wonder that there exist and ORM like layer for solr such that it
generates the solr schema from given complex object type and index given
list of corresponding objects. I wrote a simple module for that need in one
of my projects and happyly ready to generalize it and contribute to solr,
if there's not such a module exists or in progress.

Thanks all.

--
TO

Solr doesn't support complex objects directly - you must flatten and
otherwise denormalize them. If you do want to store something like a graph
in Solr, make each node a separate document (and try to avoid the
temptation to play games with dynamic and multivalued fields).

But if you have a tool to automatically flatten and denormalize complex
objects and graphs and database joins, great. Please describe what it
actually does in a little more (but not excessive) detail.

-- Jack Krupansky

-Original Message- From: Tuğcem Oral
Sent: Tuesday, June 04, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: Solr - ORM like layer

Hi folks,

I wonder that there exist and ORM like layer for solr such that it
generates the solr schema from given complex object type and index given
list of corresponding objects. I wrote a simple module for that need in one
of my projects and happyly ready to generalize it and contribute to solr,
if there's not such a module exists or in progress.

Thanks all.

-- 
TO


If by ORM you mean Object Relational Mapping, Hibernate has annotations for
Lucene and if my memory doesn't betray me I think you can configure a Solr
server at Hibernate config.

I have successfully mapped POJO's to Lucene and done text search, it all
happens like magic once your annotations and configuration is right.

Hope that helps,

Guido.

On 04/06/13 13:57, Tuğcem Oral wrote:

 Hi folks,

 I wonder that there exist and ORM like layer for solr such that it
 generates the solr schema from given complex object type and index given
 list of corresponding objects. I wrote a simple module for that need in one
 of my projects and happyly ready to generalize it and contribute to solr,
 if there's not such a module exists or in progress.

 Thanks all.


-- 
TO


Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Dotan Cohen
How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:term1 term2~10

Thus, if term1 matches 10 documents and term2 matches 20 documents,
then SET UNION would include all of the documents that have either
term1 and/or term2. That means that between 20-30 results should be
returned. Conversely, SET INTERSECTION would return only results with
_both_ term1 _and_ term2, which could be between 0-10 documents.

Note that in the application, users will be searching for any
arbitrary number of terms, in fact they will be entering phrases. I
can limit these phrases to 140 characters if needed.

Thank you in advance!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Shawn Heisey
On 6/5/2013 9:03 AM, Dotan Cohen wrote:
 How would one write a query which should perform set union on the
 search terms (term1 OR term2 OR term3), and yet also perform phrase
 matching if both terms are found? I tried a few variants of the
 following, but in every case I am getting set intersection on the
 search terms:
 
 select?q={!q.op=OR}text:term1 term2~10

A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn



Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-05 Thread Chris Atkinson
Everything is working great now.
Thanks David


On Wed, Jun 5, 2013 at 12:07 AM, David Smiley (@MITRE.org) 
dsmi...@mitre.org wrote:

 maxDistErr should be like 0.3 based on earlier parts of this discussion
 since
 your data is to one of a couple hours of the day, not whole days.  If it
 was
 whole days, you would use 1.  Changing this requires a re-index.  So does
 changing worldBounds if you do so.
 distErrPct should be 0.  Changing it does not require a re-index because
 you
 are indexing points, not other shapes.  This only affects other shapes.

 Speaking of that slight buffer to the query shape I said in my last email,
 it should be  half of maxDistErr, whatever you set that to.  So use like
 0.1.

 ~ David


 Chris Atkinson wrote
  Hi David,
  Thanks for your continued help.
 
  I think that you have nailed it on the head for me. I'm 100% sure that I
  had previously tried that query without success. I'm not sure if perhaps
 I
  had wrong  distErrPct or  maxDistErr values...
  It's getting late, so I'm going to call it a night (I'm on GMT), but I'll
  put your example into practice tomorrow and get confirmation that it's
  working as expected.
 
  I'll keep playing around with the distErrPct values as well.
  Do I need to do a reindex if I change these values? (I think yes?)
 
 
  On Tue, Jun 4, 2013 at 10:44 PM, Smiley, David W. lt;

  dsmiley@

  gt; wrote:
 
  So availability is the absence of any other document's indexed time
  duration overlapping with your availability query duration.  So I think
  you should negate an overlaps query.  The overlaps query looks like:
  Intersects(-Inf start end Inf).  And remember the slight buffering
 needed
  as described on the wiki.  You'd add a small fraction to the start time
  and subtract a small fraction from the end time, so that you don't
  accidentally match a document that is adjacent.
 
  -availability_spatial:Intersects( 0 30.5 114.5 3650 )
 
  Does that work against your data?  If it doesn't, can you conjecture why
  it doesn't work based on a sample point in a document that it matched,
 or
  a document that should have matched but didn't?
 
  ~ David
 
  On 6/4/13 3:31 PM, Chris Atkinson lt;

  chrisacky@

  gt; wrote:
 
  Here is an example I have tried.
  
  So let's assume that I want to checkIn on the 30th day, and leave on
 the
  115th day.
  
  My query would be:
  
  -availability_spatial:Intersects(   30 0  3650 115 )
  
  However, that wouldn't match anything. Here is an example document
 below
  so
  you can see. (I've not negated the spatial field in the filter query so
  you
  can see the field coordinates)
  
  In case the formatting is bad: See here
  
  http://pastie.org/pastes/8006249/text
  
  
  
  ?xml version=1.0 encoding=UTF-8?
  response
 
  lst
 
   name=responseHeader
  
  int name=status
  0
  /int
 
  int name=QTime
  1
  /int
 
  lst
 
   name=params 
  str name=flavailability_spatial
  /str
 
  str name=indent
  true
  /str
  
  str
 
   name=qid:38197
  /str
 
  str name=_
  1370374172298
  /str
 
  str name=wt
  xml
  /str
 
  str name=fq
  availability_spatial:Intersects( 30 0 3650 115
  )
  
  /str
 
  /lst
 
  /lst
 
  result name=response numFound=1 start=0
  
  doc
   
  arr name=availability_spatial
  str
  147.6 163.4
  /str
 
  str
  164.6 178.4
  /
 
   str
  str
  192.6 220.4
  /str
 
  str
  241.6 264.4
  /str
 
  /arr
  /doc
 
  /result
  
  /
 
   response
  
  
  On Tue, Jun 4, 2013 at 8:14 PM, Chris Atkinson lt;

  chrisacky@

  gt;
  wrote:
  
   Thanks David.
   Query times are really quick and my index is only 20Mb now which is
  about
   what I would expect.
   I'm having some problems figuring out what type of query I want to
  find
   *Available* properties with this new points system.
  
  
   I'm storing bookings against each document. So I have X Y
 coordinates,
   where X will be  the check in of a previous booking, and Y will be
 the
   departure.
  
   So for example illustrative purposes, a weeks booking from 10th
  January
  to
   the 17th, would be X Y = 10 17
  
  
  field name=booking
  10 17
  /field
  
  field name=booking
  22 27
  /field
  
   I might have several bookings.
  
   Now, I want to find available properties with my search, but I'm just
  not
   sure on the ordering of the end/start in the polygon Intersect.
  
   I've looked at this document very carefully and tried to draw it all
  out
   on paper.
  
  
  
 
 https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-2013011
  7/
  
   Here are the suggestions:
  
   q=fieldX:Intersects(-ƒ end start ƒ)
   q=fieldX:Intersects(-ƒ start end ƒ)
   q=fieldX:Intersects(start -ƒ ƒ end)
  
   All of these, are great for finding the existance of a field
  coordinate,
   but I need to make sure that the property is available. So I thought
 I
   could use one of these three queries in the negative by using
   -fieldX:Inter but none of those work.
  
   Can you shine some light on what I might be missing?
   What 

Re: copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Jack Krupansky
Try describing your own symptom in your own words - because his issue 
related to Solr 1.4. I mean, where exactly are you setting 
allowDuplicates=false?? And why do you think it has anything to do with 
adding documents to Solr? Solr 1.4 did not have atomic update, so sending 
the exact same document twice would not result in a change in the index 
(unless you had a date field with a value of NOW.) Copy field only uses 
values from the current document.


-- Jack Krupansky

-Original Message- 
From: Robert Krüger

Sent: Wednesday, June 05, 2013 10:37 AM
To: solr-user@lucene.apache.org
Subject: copyField generates multiple values encountered for non 
multiValued field


I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert 



Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Jack Krupansky

term1 OR term2 OR term1 term2^2

term1 OR term2 OR term1 term2~10^2

The latter would rank documents with the terms nearby higher, and the 
adjacent terms highest.


term1 OR term2 OR term1 term2~10^2 OR term1 term2^20 OR term2 term1^20

To further boost adjacent terms.

But the edismax pf/pf2/pf3 options might be good enough for you.

-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Wednesday, June 05, 2013 11:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set intersection 
on query terms


On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:term1 term2~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn 



Solr 4.3 with Internationalization.

2013-06-05 Thread bsargurunathan
Guys,

I am going to use the Solr4.3 to my Shopping cart project.
So I need to support my website with two languages(English and French).

So I want some guide for implement the internationalization with the
Slor4.3.
Please guide with some sample configuration to support the French language
with Solr4.3.

Thanks in advance.

Guru.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-05 Thread Erick Erickson
Shawn:

You're right, I thought I'd seen it as a field option but I think I
was confusing really old solr.

Thanks for catching, having gotten it wrong once I'm sure I'll
remember it better for next time!

Erick

On Tue, Jun 4, 2013 at 1:57 PM, SandeepM skmi...@hotmail.com wrote:
 Thanks Eric and Shawn,

 Your explanations help understand where SOLR may be spending its time.
 Sounds like compression can be a CPU and heap hog. (I'll try to confirm this
 with the heapdumps)

 Initially we tried to keep the JVM heap sizes the same on both Solr 3.5 and
 4.2.1, which was around 3GB ,which 3.5 handled well even with a 200QPS load.
 Moving to 4.2.1 with the same heap size instantly killed the Server.
 Changing the JVM to 6GB (double) did not help either.  We were seeing higher
 CPU and higher heap usage.

 We later changed cache settings so as to reduce their sizes, increased the
 JVM to 8GB and we see an improvement.  But over time, we do see that the
 Heap utilization slowly climbs as the 200QPS test is allowed to run, and
 sometimes leads to max heap being exceeded from the JConsole.  So we see the
 jagged edge waveform which keeps climbing (GC cycles don't completely
 collect memory over time).  Our test has a short capture from real traffic
 and we are replaying that via solrmeter.

 Thanks.
 Regards,
 -- Sandeep



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068150.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Prathik Puthran
Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik


No files added to classloader from lib

2013-06-05 Thread O. Olson
Hi,

I downloaded Solr 4.3 and I am attempting to run and configure a 
separate
Solr instance under Jetty. I copied the Solr dist directory contents to a
directory called solrDist under the single core db that I was running. I
then attempted to get the DataImportHandler using the following in my
solrconfig.xml:

  lib dir=solrDist/ regex=apache-solr-dataimporthandler-.*\.jar /

In the log file, I see a lot of messages that the Jar Files in solrDist
were added to the classloader. E.g. 

…….
534  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar'
to classloader
534  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO 
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar'
to classloader

.

However in the end I get the following Warning:

570  [coreLoadExecutor-3-thread-1] WARN 
org.apache.solr.core.SolrResourceLoader  - No files added to classloader
from lib: solrDist/ (resolved as:
C:\Users\MyUsername\Documents\Jetty\Jetty9\solr\db\solrDist).

Why is this? I thought the Jar Files were added to the classloader, but the
last messages seems to say that none were added. I know that this is a
warning, but I am just curious. I’d be grateful to anyone who has an idea
regarding this.

Thank you,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with zkcli.sh linkconfig

2013-06-05 Thread Mark Miller
Sounds like a bug - we probably don't have a test that updates a link - if you 
can make a JIRA issue, I'll be happy to look into it soon.

- Mark

On Jun 4, 2013, at 8:16 AM, Shawn Heisey s...@elyograg.org wrote:

 I've got Solr 4.2.1 running SolrCloud.  I need to change the config set
 associated with a collection.  I'm having a problem with this.  Here's
 the command that I'm running, domain name redacted:
 
 /opt/mbsolr4/cloud-scripts/zkcli.sh -cp
 /opt/mbsolr4/lib/ext/slf4j-api-1.7.2.jar:/opt/mbsolr4/lib/ext/slf4j-log4j12-1.7.2.jar
 -z
 mbzoo1.REDACTED.com:2181,mbzoo2.REDACTED.com:2181,mbzoo3.REDACTED.com:2181/mbsolr1
 -collection twotest -confname mbtestcfg -cmd linkconfig
 
 Here's part of the resulting log:
 
 Jun 04, 2013 9:08:44 AM org.apache.solr.cloud.ZkController linkConfSet
 INFO: Load collection config from:/collections/p
 Jun 04, 2013 9:08:44 AM org.apache.solr.common.cloud.SolrZkClient makePath
 INFO: makePath: /collections/p
 
 It partially creates a new collection named p, which is not referenced
 on my commandline.  This partial collection IS linked to the config set
 that I referenced.  The same thing happens if I use -c and -n instead of
 -collection and -confname.
 
 Am I doing something wrong, or is this a bug?  Will I need to recreate
 the collection as a workaround?
 
 Thanks,
 Shawn
 



Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Dotan Cohen
On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote:
 On 6/5/2013 9:03 AM, Dotan Cohen wrote:
 How would one write a query which should perform set union on the
 search terms (term1 OR term2 OR term3), and yet also perform phrase
 matching if both terms are found? I tried a few variants of the
 following, but in every case I am getting set intersection on the
 search terms:

 select?q={!q.op=OR}text:term1 term2~10

 A phrase search by definition will require all terms to be present.
 Even though it is multiple terms, conceptually it is treated as a single
 term.

 It sounds like what you are after is what edismax can do.  If you define
 the pf field in addition to the qf field, Solr will do something pretty
 amazing - it will automatically construct a phrase query from a
 non-phrase query and search with it against multiple fields.  Done
 correctly, this means that an exact match will be listed first in the
 results.

 http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

 Thanks,
 Shawn


Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field

I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
Hi Peter,

Thank you, I am glad to read that this usecase is not alien.

I'd like to make the second instance (searcher) completely read-only, so I
have disabled all the components that can write.

(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl after
commit, or write some IndexReaderFactory that checks for changes

The problem with calling the 'core reload' - is that it seems lots of work
for just opening a new searcher, eeekkk...somewhere I read that it is cheap
to reload a core, but re-opening the index searches must be definitely
cheaper...

roman


On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote:

 Hi,
 We use this very same scenario to great effect - 2 instances using the same
 dataDir with many cores - 1 is a writer (no caching), the other is a
 searcher (lots of caching).
 To get the searcher to see the index changes from the writer, you need the
 searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
 This will refresh the caches (including autowarming), [re]build the
 relevant searchers etc. and make any index changes visible to the RO
 instance.
 Also, make sure to use lockTypenative/lockType in solrconfig.xml to
 ensure the two instances don't try to commit at the same time.
 There are several ways to trigger a commit:
 Call commit() periodically within your own code.
 Use autoCommit in solrconfig.xml.
 Use an RPC/IPC mechanism between the 2 instance processes to tell the
 searcher the index has changed, then call commit when called (more complex
 coding, but good if the index changes on an ad-hoc basis).
 Note, doing things this way isn't really suitable for an NRT environment.

 HTH,
 Peter



 On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Replication is fine, I am going to use it, but I wanted it for instances
  *distributed* across several (physical) machines - but here I have one
  physical machine, it has many cores. I want to run 2 instances of solr
  because I think it has these benefits:
 
  1) I can give less RAM to the writer (4GB), and use more RAM for the
  searcher (28GB)
  2) I can deactivate warming for the writer and keep it for the searcher
  (this considerably speeds up indexing - each time we commit, the server
 is
  rebuilding a citation network of 80M edges)
  3) saving disk space and better OS caching (OS should be able to use more
  RAM for the caching, which should result in faster operations - the two
  processes are accessing the same index)
 
  Maybe I should just forget it and go with the replication, but it doesn't
  'feel right' IFF it is on the same physical machine. And Lucene
  specifically has a method for discovering changes and re-opening the
 index
  (DirectoryReader.openIfChanged)
 
  Am I not seeing something?
 
  roman
 
 
 
  On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
   Roman,
  
   Could you be more specific as to why replication doesn't meet your
   requirements?  It was geared explicitly for this purpose, including the
   automatic discovery of changes to the data on the index master.
  
   Jason
  
   On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:
  
OK, so I have verified the two instances can run alongside, sharing
 the
same datadir
   
All update handlers are unaccessible in the read-only master
   
updateHandler class=solr.DirectUpdateHandler2
enable=${solr.can.write:true}
   
java -Dsolr.can.write=false .
   
And I can reload the index manually:
   
curl 
   
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1

   
But this is not an ideal solution; I'd like for the read-only server
 to
discover index changes on its own. Any pointers?
   
Thanks,
   
 roman
   
   
On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
Hello,
   
I need your expert advice. I am thinking about running two instances
  of
solr that share the same datadirectory. The *reason* being: indexing
instance is constantly building cache after every commit (we have a
  big
cache) and this slows it down. But indexing doesn't need much RAM,
  only
   the
search does (and server has lots of CPUs)
   
So, it is like having two solr instances
   
1. solr-indexing-master
2. solr-read-only-master
   
In the solrconfig.xml I can disable update components, It should be
   fine.
However, I don't know how to 'trigger' index re-opening on (2) after
  the
commit happens on (1).
   
Ideally, the second instance could monitor the disk and re-open disk
   after
new files appear there. Do I have to implement custom
   IndexReaderFactory?
Or something else?
   
Please note: I know about the replication, this usecase is IMHO
  slightly
different - in fact, write-only-master (1) is also a replication

Re: No files added to classloader from lib

2013-06-05 Thread Jack Krupansky
apache-solr-dataimporthandler-.*\.jar - note that the apache- prefix has 
been removed from Solr jar files.


-- Jack Krupansky

-Original Message- 
From: O. Olson

Sent: Wednesday, June 05, 2013 12:01 PM
To: solr-user@lucene.apache.org
Subject: No files added to classloader from lib

Hi,

I downloaded Solr 4.3 and I am attempting to run and configure a separate
Solr instance under Jetty. I copied the Solr dist directory contents to a
directory called solrDist under the single core db that I was running. I
then attempted to get the DataImportHandler using the following in my
solrconfig.xml:

 lib dir=solrDist/ regex=apache-solr-dataimporthandler-.*\.jar /

In the log file, I see a lot of messages that the Jar Files in solrDist
were added to the classloader. E.g.

…….
534  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar'
to classloader
534  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar'
to classloader

.

However in the end I get the following Warning:

570  [coreLoadExecutor-3-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  - No files added to classloader
from lib: solrDist/ (resolved as:
C:\Users\MyUsername\Documents\Jetty\Jetty9\solr\db\solrDist).

Why is this? I thought the Jar Files were added to the classloader, but the
last messages seems to say that none were added. I know that this is a
warning, but I am just curious. I’d be grateful to anyone who has an idea
regarding this.

Thank you,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Dotan Cohen
On Wed, Jun 5, 2013 at 6:23 PM, Jack Krupansky j...@basetechnology.com wrote:
 term1 OR term2 OR term1 term2^2

 term1 OR term2 OR term1 term2~10^2

 The latter would rank documents with the terms nearby higher, and the
 adjacent terms highest.

 term1 OR term2 OR term1 term2~10^2 OR term1 term2^20 OR term2 term1^20

 To further boost adjacent terms.

 But the edismax pf/pf2/pf3 options might be good enough for you.


Thank you Jack. I suppose that I could write a script in PHP to create
such a query string from an arbitrary-length phrase, but it wouldn't
be pretty! Edismax does in fact meet my need, though.

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Jack Krupansky

ngrams?

See:
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

-- Jack Krupansky

-Original Message- 
From: Prathik Puthran

Sent: Wednesday, June 05, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Configuring lucene to suggest the indexed string for all the 
searches of the substring of the indexed string


Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik 



Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-05 Thread SandeepM
/So we see the jagged edge waveform which keeps climbing (GC cycles don't
completely collect memory over time).  Our test has a short capture from
real traffic and we are replaying that via solrmeter./

Any idea why the memory climbs over time.  The GC should cleanup after data
is shipped back.  Could there be a memory leak in SOLR?

Appreciate any help.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Jack Krupansky

Is there any other documentation that I should review?

It's in the works! Within a week or two.

-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Wednesday, June 05, 2013 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set intersection 
on query terms


On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote:

On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:term1 term2~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn



Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field

I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com 



Re: Create index on few unrelated table in Solr

2013-06-05 Thread sodoo
Yes. My ID field is uniquekey. How can I don't override each other?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068371.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: data-import problem

2013-06-05 Thread sodoo
Maybe problem is two document declare in data-config.xml.

You will try change this one.

document 
 entity name=name query=SELECT id, name FROM name/entity 
 entity name=title query=SELECT id AS titleid, title FROM
name/entity 
/document 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multitable import - uniqueKey

2013-06-05 Thread sodoo
Hehe. 

Yes my all tables ID field names are different. 
For example:

I have 5 table. These names are 'admin, account, group, checklist'

admin=id -uniquekey
account=account_id -uniquekey
group=group_id -uniquekey
checklist=id-uniquekey

Also I thought last entity overwrite other entities. 

I'm sorry. I don't understand your this example.

Now I try to use below config. 
### data-config.xml

 entity name=admin query=select *from admin dataSource=ds-1
field column=id name=id /

entity name=checklist query=select *from checklist dataSource=ds-1
field column=id name=id /

entity name=groups query=select *from groups dataSource=ds-1
field column=group_id name=id /

entity name=account query=select *from accounts dataSource=ds-1
field column=account_id name=id /


Then my schema.xml

field name=id stored=true type=string multiValued=false
indexed=true/

uniqueKeyid/uniqueKey


How can I don't overwrite other entities?
Please assist me on this example.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multitable-import-uniqueKey-tp4067796p4068384.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Prathik Puthran
ngrams won't work here. If I index all the ngrams of the string and when I
try to search for some string it would suggest all the ngrams as well.
Eg:
Dictionary contains the word Jason Bourne and you index all the ngrams of
the above word.
When I try to search for Jason solr suggests all the ngrams having the
word Jason. Instead of just suggesting Jason Bourne lucene suggests
Jason B, Jason Bo, Jason Bou, Jason Bour, Jason Bourn, Jason
Bourne.

What should I do so that I only get Jason Bourne as the suggestion when
the uses searches any substring of this (Bour, Bourne etc).


On Wed, Jun 5, 2013 at 9:39 PM, Jack Krupansky j...@basetechnology.comwrote:

 ngrams?

 See:
 http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
 apache/lucene/analysis/ngram/**NGramFilterFactory.htmlhttp://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

 -- Jack Krupansky

 -Original Message- From: Prathik Puthran
 Sent: Wednesday, June 05, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Configuring lucene to suggest the indexed string for all the
 searches of the substring of the indexed string


 Hi,

 Is it possible to configure solr to suggest the indexed string for all the
 searches of the substring of the string?

 Thanks,
 Prathik



Re: Multitable import - uniqueKey

2013-06-05 Thread Chris Hostetter

: How can I don't overwrite other entities?
: Please assist me on this example.

I'm confused, you sent this in direct reply to my last message, which 
contained the following...

1) a paragraph describing the general approach to solving this type of 
problem...

 You can use TemplateTransformer to create a synthetic ID for each 
 entity using some constant value combined with the auto-increment 
 value from your DB, for example...

2) a link to an article i wrote a while back dicussing how to solve the 
exact problem you are having...

http://searchhub.org/2011/02/12/solr-powered-isfdb-part-4/

3) links to specific commits in a github repo where there is a working 
example of using DIH to index multiple types of documents from differnet 
tables in a single Solr index.  The commits i linked to show *exactly* 
which changes are needed to go from indexing a single entity to indexing 
two entities w/o conflicting ids...

https://github.com/lucidimagination/isfdb-solr/commit/85d7caf19746399755f6f1c39f48a654da3c5b11
https://github.com/lucidimagination/isfdb-solr/commit/26e945747404125ce5b835e2157c6e2612ff2387


...did you look at any of this? did you try it? do you have any pecific 
quesions baout this approach?



-Hoss


Configuring seperate db-data-config.xml per shard

2013-06-05 Thread RadhaJayalakshmi
Hi,
We have a setup where we have 3 shards in a collection, and each shard in
the collection need to load different sets of data
That is
Shard1- will contain data only for Entity1
Shard2 - will contain data for entity2
shard3- will contain data for entity3
So in this case,. the db-data-config.xml can't be same for three shards so
it can;'t be uploaded in zookeeper.
Is there any way, where we can mantain db-data-config.xml inside each
shard's folder and make our shards to refer to this
db-data-config.xml(during data import), rather than looking for this file in
zookeepers repository

Thanks in Advance
Radha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Create index on few unrelated table in Solr

2013-06-05 Thread Chris Hostetter

Please don't create new threads re-asking the same questions -- especailly 
when the existing thread is only a day old, and still actively getting 
responses.

it just increases the overall noise of of the list, and results in 
multiple people wasting their time providing you with the same answers...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccaldws-wknmwuralhhmmmtth+7noy1ewu0z-shtmwcoaxzes...@mail.gmail.com%3E

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3Calpine.DEB.2.02.1306041534070.2959@frisbee%3E




: Date: Tue, 4 Jun 2013 02:10:52 -0700 (PDT)
: From: sodoo first...@yahoo.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Create index on few unrelated table in Solr
: 
: I want to create index few tables. All tables not related.
: 
: In data-config.xml, that I created to create index
: 
: dataConfig
:  dataSource type=JdbcDataSource 
:   name=ds-1
:   driver=com.mysql.jdbc.Driver
:   url=jdbc:mysql://localhost/testdb 
:   user=root 
:   password=***/
:   document
: entity name=admin query=select *from admin dataSource=ds-1
: field column=id name=id /
: field column=name name=name /
: field column=mail name=mail /
: /entity
: 
: entity name=checklist query=select *from checklist
: dataSource=ds-1
: field column=id name=id /
: field column=title name=title /
: field column=connect name=connect /
: /entity
: 
: entity name=account query=select *from account dataSource=ds-1
: field column=id name=id /
: field column=name name=name /
: field column=code name=code /
: /entity
: /document
:  
: And I have register schema.xml these fields. 
: I tried to make full import but unfortunately only the last entity is
: indexed. Other entities are not index.
: 
: What should I do to import all the entities?
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss


Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Dotan Cohen
 select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field


Is there any way to perform a fuzzy search with this method? I have
tried appending ~1 to every term in the search like so:
select?defType=edismaxq={!q.op=OR}search_field:term1~1%20term2~1pf=search_field

However, two issues:
1) It doesn't work! The results are identical to the results given
when not appending ~1 to every term (or ~3).

2) If at all possible, I would rather define the 'fuzzyness'
elsewhere. Right now I would have to mangle the user-input in order to
add the ~1 to the end of each term.

Note that the ExtendedDisMax page does in fact mention that fuzziness
is supported:
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Query Elevation Component

2013-06-05 Thread davers
I have not implemented it yet. And I forget the exact webpage I found. But
there was a person on that page discussing the same problem and said it was
easy to implement a solution for it but he did not share his solution. If
you figure it out let me know.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068394.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Rahul R
Hoss,
We rely heavily on facet.mincount because once a user has selected a facet,
it doesn't make sense for us to show that facet field to him and let him
filter again with the same facet. Also, when a facet has only one value, it
doesn't make sense to show it to the user, since searching with that facet
is just going to give the same result set again. So when facet.missing does
not work with facet.mincount, it is a bit of a hassle for us Will work
on handling it in our program.Thank you for the clarification

- Rahul


On Wed, Jun 5, 2013 at 12:32 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : that facet value and see all documents. I thought facet.missing=true was
 : the answer.
 ...
 : facquery.setFacetMinCount(1);

 Hmm, yeah -- it looks like facet.missing doesn't take facet.mincount into
 consideration.

 I don't remember if that was intentional or not, but as a special case
 one-off count it seems like a toss up as to wether it would be more or
 less surprising to hide it if it's below the mincount. (it's very similar
 to doing one off facet.query for example, and those are always included in
 the response and don't consider the facet.mincount either)

 In general, this seems like a low impact thing though, correct?  i mean:
 the main advantage of facet.mincount is to reduce what could be a very
 large amount of useless data from being stream from the server-client,
 particularly in the case of using facet.sort where you really need the
 consraints eliminated server side in order to get the sort=limit applied
 correctly.

 but with the facet.missing value, it's just a single value per field that
 can easily be ignored by the client if it's not desired because of the
 mincount.  or to put it another way: the amount of work needed to ignor
 this on the client, is less then the amount of work to make it
 configurable to ignore it on the server.


 -Hoss



Re: copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Robert Krüger
OK, I have two fields defined as follows:

  field name=name   type=string   indexed=true  stored=true
multiValued=false /
  field name=name2   type=string_ci   indexed=true
stored=true  multiValued=false /

and this copyField directive

 copyField source=name dest=name2/

I updated the Index using SolrJ and got the exact same error message
that is in the subject. However, while waiting for feedback I built a
workaround at the application level and now reconstructing the
original state, to be able to answer you, I have different behaviour.
What happens now is that the field name2 is populated with multiple
values although it is not defined as multiValued (see above).

Although this is strange, it is consistent with the earlier problem in
that copyField does not seem to overwrite the existing field values. I
may be using it incorrectly (it's the first time I am using copyField)
but the docs in the wiki did not say anything about an overwrite
option.

Cheers,

Robert


On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky j...@basetechnology.com wrote:
 Try describing your own symptom in your own words - because his issue
 related to Solr 1.4. I mean, where exactly are you setting
 allowDuplicates=false?? And why do you think it has anything to do with
 adding documents to Solr? Solr 1.4 did not have atomic update, so sending
 the exact same document twice would not result in a change in the index
 (unless you had a date field with a value of NOW.) Copy field only uses
 values from the current document.

 -- Jack Krupansky

 -Original Message- From: Robert Krüger
 Sent: Wednesday, June 05, 2013 10:37 AM
 To: solr-user@lucene.apache.org
 Subject: copyField generates multiple values encountered for non
 multiValued field


 I have the exact same problem as the guy here:

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

 AFAICS he did not get an answer. Is this a known issue? What can I do
 other than doing what copyField should do in my application?

 I am using solr 4.0.0.

 Thanks,

 Robert


Re: java.lang.NumberFormatException when adding latitude,longitude using DIH

2013-06-05 Thread bbarani
Thanks a lot for your response Hoss.. I thought about using scriptTransformer
too but just thought of checking if there is any other way to do that..

Btw, for some reason the values are getting overridden even though its a
multivalued field.. Not sure where I am going wrong!!!

for latlong values - 33.7209548950195,34.474838
-117.176193237305,-117.573463   

The below value is getting indexed..

arr name=geo
str34.474838,-117.573463/str
/arr

*Script transformer:*




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068401.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
So here it is for a record how I am solving it right now:

Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false


solrconfig.xml changes:

1. all index changing components have this bit,
enable=${montysolr.master:true} - ie.

updateHandler class=solr.DirectUpdateHandler2
 enable=${montysolr.master:true}

2. for cache warming de/activation

listener event=newSearcher
  class=solr.QuerySenderListener
  enable=${montysolr.enable.warming:true}...

3. to trigger refresh of the read-only-master (from write-master):

listener event=postCommit
  class=solr.RunExecutableListener
  enable=${montysolr.master:true}
  str name=execurl/str
  str name=dir./str
  bool name=waitfalse/bool
  arr name=args str${montysolr.read.master:http://localhost
}/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
/listener

This works, I still don't like the reload of the whole core, but it seems
like the easiest thing to do now.

-- roman


On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Peter,

 Thank you, I am glad to read that this usecase is not alien.

 I'd like to make the second instance (searcher) completely read-only, so I
 have disabled all the components that can write.

 (being lazy ;)) I'll probably use
 http://wiki.apache.org/solr/CollectionDistribution to call the curl after
 commit, or write some IndexReaderFactory that checks for changes

 The problem with calling the 'core reload' - is that it seems lots of work
 for just opening a new searcher, eeekkk...somewhere I read that it is cheap
 to reload a core, but re-opening the index searches must be definitely
 cheaper...

 roman


 On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.comwrote:

 Hi,
 We use this very same scenario to great effect - 2 instances using the
 same
 dataDir with many cores - 1 is a writer (no caching), the other is a
 searcher (lots of caching).
 To get the searcher to see the index changes from the writer, you need the
 searcher to do an empty commit - i.e. you invoke a commit with 0
 documents.
 This will refresh the caches (including autowarming), [re]build the
 relevant searchers etc. and make any index changes visible to the RO
 instance.
 Also, make sure to use lockTypenative/lockType in solrconfig.xml to
 ensure the two instances don't try to commit at the same time.
 There are several ways to trigger a commit:
 Call commit() periodically within your own code.
 Use autoCommit in solrconfig.xml.
 Use an RPC/IPC mechanism between the 2 instance processes to tell the
 searcher the index has changed, then call commit when called (more complex
 coding, but good if the index changes on an ad-hoc basis).
 Note, doing things this way isn't really suitable for an NRT environment.

 HTH,
 Peter



 On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Replication is fine, I am going to use it, but I wanted it for instances
  *distributed* across several (physical) machines - but here I have one
  physical machine, it has many cores. I want to run 2 instances of solr
  because I think it has these benefits:
 
  1) I can give less RAM to the writer (4GB), and use more RAM for the
  searcher (28GB)
  2) I can deactivate warming for the writer and keep it for the searcher
  (this considerably speeds up indexing - each time we commit, the server
 is
  rebuilding a citation network of 80M edges)
  3) saving disk space and better OS caching (OS should be able to use
 more
  RAM for the caching, which should result in faster operations - the two
  processes are accessing the same index)
 
  Maybe I should just forget it and go with the replication, but it
 doesn't
  'feel right' IFF it is on the same physical machine. And Lucene
  specifically has a method for discovering changes and re-opening the
 index
  (DirectoryReader.openIfChanged)
 
  Am I not seeing something?
 
  roman
 
 
 
  On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
   Roman,
  
   Could you be more specific as to why replication doesn't meet your
   requirements?  It was geared explicitly for this purpose, including
 the
   automatic discovery of changes to the data on the index master.
  
   Jason
  
   On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
  
OK, so I have verified the two instances can run alongside, sharing
 the
same datadir
   
All update handlers are unaccessible in the read-only master
   
updateHandler class=solr.DirectUpdateHandler2
enable=${solr.can.write:true}
   
java -Dsolr.can.write=false .
   
And I can reload the index manually:
   
curl 
   
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1

  

Re: Create index on few unrelated table in Solr

2013-06-05 Thread sodoo
Okey. I'm so sorry. I will not create same task in separate topic next time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068405.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Sole instance state is down in cloud mode

2013-06-05 Thread James Thomas
Are you using IE?  If so, you might want to try using Firefox.

-Original Message-
From: sathish_ix [mailto:skandhasw...@inautix.co.in] 
Sent: Wednesday, June 05, 2013 6:16 AM
To: solr-user@lucene.apache.org
Subject: Sole instance state is down in cloud mode

Hi,

When i start a core in solr-cloud im getting below message in log

I have setup zookeeper separately and uploaded the config files.
When i start the solr instance in cloud mode, state is down.


INFO: Update state numShards=null message={
  operation:state,
  numShards:null,
  shard:shard1,
  roles:null,
  *state:down,*
  core:core1,
  collection:core1,
  node_name:x:9980_solr,
  base_url:http://x:9980/solr}
Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change: WatchedEvent state:SyncConnected 
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)


When i hit the url , i am getting left pane of the solr admin and righ side its 
keep on loading, any help ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Eustache Felenc
There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with 
nice examples.


On 06/05/2013 12:13 PM, Jack Krupansky wrote:

Is there any other documentation that I should review?

It's in the works! Within a week or two.

-- Jack Krupansky

-Original Message- From: Dotan Cohen
Sent: Wednesday, June 05, 2013 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set 
intersection on query terms


On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote:

On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:term1 term2~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn



Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismaxq={!q.op=OR}search_field:term1 
term2pf=search_field


I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com





Re: java.lang.NumberFormatException when adding latitude,longitude using DIH

2013-06-05 Thread bbarani
That was a very silly mistake. I forgot to add the values to array before
putting it inside row..the below code works.. Thanks a lot...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068410.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Dotan Cohen
On Wed, Jun 5, 2013 at 9:04 PM, Eustache Felenc
eustache.fel...@idilia.com wrote:
 There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice
 examples.


Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Pivot Facets refining datetime, bleh

2013-06-05 Thread Stein Gran
This may be more suitable on the dev-list, but distributed pivot facets is
a very powerful feature. The Jira issue for this is SOLR-2894 (
https://issues.apache.org/jira/browse/SOLR-2894). I have done some testing
of the last patch for this issue, and it is as Andrew says: Everything but
datetime fields works just fine. There are no error messages for datetime
fields when used in a SolrCloud setup, the expected values are just not
there.

Best,
Stein J. Gran


On Thu, May 30, 2013 at 5:49 PM, Andrew Muldowney 
andrew.muldowne...@gmail.com wrote:

 I've been trying to get into how distributed field facets do their work but
 I haven't been able to uncover how they deal with this issue.

 Currently distrib pivot facets does a getTermCounts(first_field) to
 populate a list at the level its working on.

 When putting together the data structure we set up a BytesRef, fill it in
 with the value using the FieldType.ReadableToIndexed call and then add the
 FieldType.ToObject of that bytesRef and associated field.
 --From getTermCounts comes fieldValue--
   termval = new BytesRef();
  ftype.readableToIndexed(fieldValue, termval);
 pivot.add( value, ftype.toObject(sfield, termval) );


 This works great for everything but datetime, as datetime's .ToObject turns
 it into a human readable string that is unconvertable -at least in my
 investigation.

 I've tried to use the FieldType.ToInternal but that also fails on the human
 readable datetime format.

 My original idea was to skip the aformentioned block of code and just
 straight add the fieldValue to the data structure. This caused some pivot
 facet tests to return wonky results, I'm not sure if I should go down the
 path of trying to figure out those problems or if there is a different
 approach I should be taking.

 Any general guidance on how distributed field facets deals with this would
 be much appreciated.



Re: data-import problem

2013-06-05 Thread Stavros Delisavas

Thanks so far.

This change makes Solr work over the title-entries too, yay! 
Unfortunatly they don't get processed(skipped rows). In my log it says

missing required field id for every entry.

I checked my schema.xml. In there id is not set as a required field. 
removing the uniquekey-property also leads to no improvement.


Any further ideas?





Am 05.06.2013 18:01, schrieb sodoo:

Maybe problem is two document declare in data-config.xml.

You will try change this one.

document
  entity name=name query=SELECT id, name FROM name/entity
  entity name=title query=SELECT id AS titleid, title FROM
name/entity
/document




--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: data-import problem

2013-06-05 Thread Raymond Wiker
On Jun 5, 2013, at 20:39 , Stavros Delisavas stav...@delisavas.de wrote:
 Thanks so far.
 
 This change makes Solr work over the title-entries too, yay! Unfortunatly 
 they don't get processed(skipped rows). In my log it says
 missing required field id for every entry.
 
 I checked my schema.xml. In there id is not set as a required field. 
 removing the uniquekey-property also leads to no improvement.
 
 Any further ideas?


You need a field to hold a unique identifier for the document, and your 
data-import setup must ensure that that specific fields gets a unique 
identifier. Unique here means unique across all documents, no matter where 
they come from.




Re: data-import problem

2013-06-05 Thread Gora Mohanty
On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote:

 Thanks so far.

 This change makes Solr work over the title-entries too, yay! Unfortunatly
 they don't get processed(skipped rows). In my log it says
 missing required field id for every entry.

 I checked my schema.xml. In there id is not set as a required field.
 removing the uniquekey-property also leads to no improvement.
[...]

There are several things wrong with your problem statement.
You say that you have two tables, but both SELECTs seem
to use the same table. I am going to assume that you really
have two different tables.

Unless you have changed the default schema.xml, id should
be defined as the uniqueKey for the document. You probably
do not want to remove that, and even if you just remove the
uniqueKey property, the field id remains defined as a required
field.

The issue is with with your SELECT for the second entity:
entity name=title query=SELECT id AS titleid, title FROM name/entity
This renames id to titleid, and hence the required field
id in schema.xml is missing.

While you do need something like:
document
  entity name=name query=SELECT id, name FROM name1/entity
  entity name=title query=SELECT id, title FROM name2/entity
/document

However, you will need to ensure that the ids are unique
in the two tables, else entries from the second entity will
overwrite matching ids from the first.

Also, do you have field definitions within the entities? Please
share the complete schema.xml and the DIH configuration
file with us, rather than snippets: Use pastebin.com if they
are large.

Regards,
Gora


Re: copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Jack Krupansky
Look in the Solr log - the error message should tell you what the multiple 
values are. For example,


95484 [qtp2998209-11] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: ERROR: [doc=doc-1] multiple values 
encountered for non multiValued field content_s: [def, abc]


One of the values should be the value of the field that is the source of the 
copyField. Maybe the other value will give you a clue as to where it came 
from.


Check your SolrJ code - maybe you actually do try to initialize a value in 
the field that is the copyField target.


-- Jack Krupansky

-Original Message- 
From: Robert Krüger

Sent: Wednesday, June 05, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Re: copyField generates multiple values encountered for non 
multiValued field


OK, I have two fields defined as follows:

 field name=name   type=string   indexed=true  stored=true
multiValued=false /
 field name=name2   type=string_ci   indexed=true
stored=true  multiValued=false /

and this copyField directive

copyField source=name dest=name2/

I updated the Index using SolrJ and got the exact same error message
that is in the subject. However, while waiting for feedback I built a
workaround at the application level and now reconstructing the
original state, to be able to answer you, I have different behaviour.
What happens now is that the field name2 is populated with multiple
values although it is not defined as multiValued (see above).

Although this is strange, it is consistent with the earlier problem in
that copyField does not seem to overwrite the existing field values. I
may be using it incorrectly (it's the first time I am using copyField)
but the docs in the wiki did not say anything about an overwrite
option.

Cheers,

Robert


On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky j...@basetechnology.com 
wrote:

Try describing your own symptom in your own words - because his issue
related to Solr 1.4. I mean, where exactly are you setting
allowDuplicates=false?? And why do you think it has anything to do with
adding documents to Solr? Solr 1.4 did not have atomic update, so sending
the exact same document twice would not result in a change in the index
(unless you had a date field with a value of NOW.) Copy field only uses
values from the current document.

-- Jack Krupansky

-Original Message- From: Robert Krüger
Sent: Wednesday, June 05, 2013 10:37 AM
To: solr-user@lucene.apache.org
Subject: copyField generates multiple values encountered for non
multiValued field


I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert 




Re: No files added to classloader from lib

2013-06-05 Thread O. Olson
Good call Jack. I totally missed that. I am curious how dataimport handler
worked before – if I made a mistake in the specification and it did not get
the jar. Anyway, it works now. Thanks again.
O.O.


apache-solr-dataimporthandler-.*\.jar - note that the apache- prefix has 
been removed from Solr jar files.

-- Jack Krupansky





--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374p4068421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: data-import problem

2013-06-05 Thread Stavros Delisavas

Thanks for the hints.
I am not sure how to solve this issue. I previously made a typo, there 
are definetly two different tables.

Here is my real configuration:

http://pastebin.com/JUDzaMk0

For testing purposes I added LIMIT 10 to the SQL-statements because my 
tables are very huge and tests would take too long (about 5gb, 
6.5million rows). I included my whole data-config and what I have 
changed from the default schema.xml. I don't know how to solve the all 
ids have to be unique-problem. I can not believe that Solr does not 
offer any solution at all to handle multiple data sources with their own 
individual ids. Maybe its possible to have solr create its own ids while 
importing the data?


Actually there is no direct relation between my name-table and my 
title-table. All I want is to be able to do fast text-search in those 
two tables in order to find the belonging ids of these entries.


Let me know if you need more information.

Thank you!





Am 05.06.2013 20:54, schrieb Gora Mohanty:

On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote:

Thanks so far.

This change makes Solr work over the title-entries too, yay! Unfortunatly
they don't get processed(skipped rows). In my log it says
missing required field id for every entry.

I checked my schema.xml. In there id is not set as a required field.
removing the uniquekey-property also leads to no improvement.

[...]

There are several things wrong with your problem statement.
You say that you have two tables, but both SELECTs seem
to use the same table. I am going to assume that you really
have two different tables.

Unless you have changed the default schema.xml, id should
be defined as the uniqueKey for the document. You probably
do not want to remove that, and even if you just remove the
uniqueKey property, the field id remains defined as a required
field.

The issue is with with your SELECT for the second entity:
entity name=title query=SELECT id AS titleid, title FROM name/entity
This renames id to titleid, and hence the required field
id in schema.xml is missing.

While you do need something like:
document
   entity name=name query=SELECT id, name FROM name1/entity
   entity name=title query=SELECT id, title FROM name2/entity
/document

However, you will need to ensure that the ids are unique
in the two tables, else entries from the second entity will
overwrite matching ids from the first.

Also, do you have field definitions within the entities? Please
share the complete schema.xml and the DIH configuration
file with us, rather than snippets: Use pastebin.com if they
are large.

Regards,
Gora





Entire query is stopwords

2013-06-05 Thread Vardhan Dharnidharka



Hi, 

I am using the standard edismax parser and my example query is as follows:

{!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'}

In this case, 'object' happens to be a stopword in the StopWordsFilter in my 
datatype 'object_description'. Now, since 'object' is not indexed at all, the 
query does not return any results. In an ideal case, I would want documents 
containing the term 'object' to be returned. 

What is the best practice to achieve this? Index stop-words and re-query with 
'stopwords=false'. Or can this be done without re-querying?

Thanks, 
Vardhan 
  

Re: Solr 4.3 with Internationalization.

2013-06-05 Thread bbarani
Check out this
http://stackoverflow.com/questions/5549880/using-solr-for-indexing-multiple-languages

http://wiki.apache.org/solr/LanguageAnalysis#French

French stop words file (sample):
http://trac.foswiki.org/browser/trunk/SolrPlugin/solr/multicore/conf/stopwords-fr.txt

Solr includes three stemmers for French: one via
solr.SnowballPorterFilterFactory, an alternative stemmer  Solr3.1 via
solr.FrenchLightStemFilterFactory, and an even less aggressive approach 
Solr3.1 via solr.FrenchMinimalStemFilterFactory. Solr can also removing
elisions via solr.ElisionFilterFactory, and Lucene includes an example
stopword list.


...
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.ElisionFilterFactory/
  
  filter class=solr.SnowballPorterFilterFactory language=French /
...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368p4068426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with zkcli.sh linkconfig

2013-06-05 Thread Shawn Heisey

On 6/5/2013 10:05 AM, Mark Miller wrote:

Sounds like a bug - we probably don't have a test that updates a link - if you 
can make a JIRA issue, I'll be happy to look into it soon.


I will go ahead and create an issue so that a test can be built, but I 
have some more info: It works perfectly when running the script from the 
4.3.1 example, and from the 4.2.1 example.


I am using slf4j 1.7.2 and log4j 1.4.17 in my production 4.2.1 lib/ext. 
 That is the only difference I can think of at the moment.


Thanks,
Shawn



Solrj Stats encoding problem

2013-06-05 Thread ethereal
Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Mikhail Khludnev
Please excuse my misunderstanding, but I always wonder why this index time
processing is suggested usually. from my POV is the case for query-time
processing i.e. PrefixQuery aka wildcard query Jason* .
Ultra-fast term retrieval also provided by TermsComponent.


On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky j...@basetechnology.comwrote:

 ngrams?

 See:
 http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
 apache/lucene/analysis/ngram/**NGramFilterFactory.htmlhttp://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

 -- Jack Krupansky

 -Original Message- From: Prathik Puthran
 Sent: Wednesday, June 05, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Configuring lucene to suggest the indexed string for all the
 searches of the substring of the indexed string


 Hi,

 Is it possible to configure solr to suggest the indexed string for all the
 searches of the substring of the string?

 Thanks,
 Prathik




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Solrj Stats encoding problem

2013-06-05 Thread Jack Krupansky
Sounds like the Solr Admin UI is too-aggressively encoding the query part of 
the URL for display. Each query parameter value needs to be encoded, not the 
entire URL query string as a whole.


-- Jack Krupansky

-Original Message- 
From: ethereal

Sent: Wednesday, June 05, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Solrj Stats encoding problem

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Heap space problem with mlt query

2013-06-05 Thread Erick Erickson
To add some numbers to adityab's comment.

Each entry in your filter cache will probably consist
of maxDocs/8 bytes plus some overhead. Or about 16G.
This will only grow as you fire queries at Solr, so
it's no surprise you're running out of memory as you
process queries.

Your documentCache is probably also a problem, although
I'm extrapolating based on an 80G index with only 1M docs.

The result cache is also very big, but it's usually much smaller.
Still, I'd set it back to the defaults.

Why did you change these from the defaults? The very
first thing I'd do is change them back.

Your autowarm counts are also a problem at 2,048.
Again, take the filterCache. It's essentially a map
where each entry's key is the fq clause and the
value is the set of documents that match the query,
often stored as a bit set (thus the maxDocs/8 above).
Whenever a new searcher is opened in your setup, the
most recent 2,048 fq clauses will be re-executed. Which
should really kill your searcher open times. Try something
reasonable like 16-32.

These are caches that are intended to age out the oldest
entries, not hold all the entries you ever send at Solr.

Best
Erick

On Wed, Jun 5, 2013 at 9:35 AM, adityab aditya_ba...@yahoo.com wrote:
 Did you try reducing filter and query cache. They are fairly large too unless
 you really need them to be cached for your use cache.
 Do you have that many distinct filter queries hitting solr for the size you
 have defined for filterCache?
 Are you doing any sorting? as this will chew up a lot of memory because of
 lucene's internal field cache




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Heavy dataset

2013-06-05 Thread Erick Erickson
Note that stored=true/false is irrelevant to the raw search time.

What it _is_ relevant to is the time it takes to assemble the doc
for return, if (and only if) you return that field. I claim your search
time would be fast if you went ahead and stored the field,
and specified an fl clause that did NOT contain the big field. Oh,
and you'd have to have lazy field loading enabled too.

FWIW,
Erick

On Wed, Jun 5, 2013 at 10:29 AM, Raheel Hasan raheelhasan@gmail.com wrote:
 some values in the field are up to a 1M as well


 On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 ok thanks for the reply The field having values like 60kb each

 Furthermore, I have realized that the issue is with MySQL as its not
 processing this table when a where is applied

 Secondly, I have turned this field to *stored=false* and now the *
 select/* is fast working again



 On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/5/2013 3:08 AM, Raheel Hasan wrote:
  Hi,
 
  I am trying to index a heavy dataset with 1 particular field really too
  heavy...
 
  However, As I start, I get Memory warning and rollback
 (OutOfMemoryError).
  So, I have learned that we can use -Xmx1024m option with java command to
  start the solr and allocate more memory to the heap.
 
  My question is, that since this could also become insufficient later,
 so it
  the issue related to cacheing?
 
  here is my cache block in solrconfig:
 
  filterCache class=solr.FastLRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  queryResultCache class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
  documentCache class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/
 
  I am thinking like maybe I need to turn of the cache for
 documentClass.
  Anyone got a better idea? Or perhaps there is another issue here?

 Exactly how big is this field?  Do you need this giant field returned
 with your results, or is it just there for searching?

 Caches of size 512, especially with autowarm disabled, are probably not
 a major cause for concern, unless the big field is big enough so that
 512 of them is really really huge.  If that's the case, I would reduce
 the size of your documentCache, not turn it off.

 The value of ramBufferSizeMB elsewhere in your config is more likely to
 affect how much RAM gets used during indexing.  The default for this
 field as of Solr 4.1.0 is 100.  Most people can reduce this value.

 I'm writing a reply to another thread where you are participating, with
 info that will likely be useful for you too.  Look for that.

 Thanks,
 Shawn




 --
 Regards,
 Raheel Hasan




 --
 Regards,
 Raheel Hasan


Re: Solrj Stats encoding problem

2013-06-05 Thread Chris Hostetter

: I've tested a query using solr admin web interface and it works fine.
: But when I'm trying to execute the same search using solrj, it doesn't
: include Stats information.
: I've figured out that it's because my query is encoded.

I don't think you are understading how to use SolrJ andthe SolrQuery 
object

: Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
: 
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
: The query in java is like
: 
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType

...

: SolrQuery solrQuery = new SolrQuery();
: solrQuery.setQuery(queryBuilder.toString());
: QueryResponse query = getSolrServer().query(solrQuery);

it looks like you are passing the setQuery method an entire URL encoded 
set of params from a request you made in your browser.  the setQuery 
method is syntactic sugar for for specifying just the q param containing 
the query string, and it should not alreayd 
be escaped (ie: eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]).  Other methods exist on the SolrQuery 
object to provide syntactic sugar for other things (ie: specifying facet 
fields, enabling highlighting, etc...)

If you want to provide a list of params using explicit names (q, stats, 
stats,field, etc...) you can ignore the helper methods on SolrQuery and 
just direct use the low level methods it inherits from 
ModifibleSolrParams like setParam ...


SolrQuery query = new SolrQuery();
query.setParam(q, eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]);
query.setParam(stats, true);
query.setParam(stats.field, numberOfBytes,eventType);
QueryResponse response = getSolrServer().query(query);


-Hoss


Re: data-import problem

2013-06-05 Thread Erick Erickson
My usual admonishment is that Solr isn't a database, and when
you try to use it like one you're just _asking_ for problems. That
said

Consider two options:
1 use a different core for each table.
2 in schema.xml, remove the id field (required=true _might_ be specified)
  remove the uniqueKey definition in schema.xml
You'll have to re-index of course.

But do not, that while Solr does not _require_ a uniqueKey definition,
almost all solr installations have one.

Best
Erick

On Wed, Jun 5, 2013 at 3:19 PM, Stavros Delisavas stav...@delisavas.de wrote:
 Thanks for the hints.
 I am not sure how to solve this issue. I previously made a typo, there are
 definetly two different tables.
 Here is my real configuration:

 http://pastebin.com/JUDzaMk0

 For testing purposes I added LIMIT 10 to the SQL-statements because my
 tables are very huge and tests would take too long (about 5gb, 6.5million
 rows). I included my whole data-config and what I have changed from the
 default schema.xml. I don't know how to solve the all ids have to be
 unique-problem. I can not believe that Solr does not offer any solution at
 all to handle multiple data sources with their own individual ids. Maybe its
 possible to have solr create its own ids while importing the data?

 Actually there is no direct relation between my name-table and my
 title-table. All I want is to be able to do fast text-search in those two
 tables in order to find the belonging ids of these entries.

 Let me know if you need more information.

 Thank you!





 Am 05.06.2013 20:54, schrieb Gora Mohanty:

 On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote:

 Thanks so far.

 This change makes Solr work over the title-entries too, yay! Unfortunatly
 they don't get processed(skipped rows). In my log it says
 missing required field id for every entry.

 I checked my schema.xml. In there id is not set as a required field.
 removing the uniquekey-property also leads to no improvement.

 [...]

 There are several things wrong with your problem statement.
 You say that you have two tables, but both SELECTs seem
 to use the same table. I am going to assume that you really
 have two different tables.

 Unless you have changed the default schema.xml, id should
 be defined as the uniqueKey for the document. You probably
 do not want to remove that, and even if you just remove the
 uniqueKey property, the field id remains defined as a required
 field.

 The issue is with with your SELECT for the second entity:
 entity name=title query=SELECT id AS titleid, title FROM
 name/entity
 This renames id to titleid, and hence the required field
 id in schema.xml is missing.

 While you do need something like:
 document
entity name=name query=SELECT id, name FROM name1/entity
entity name=title query=SELECT id, title FROM name2/entity
 /document

 However, you will need to ensure that the ids are unique
 in the two tables, else entries from the second entity will
 overwrite matching ids from the first.

 Also, do you have field definitions within the entities? Please
 share the complete schema.xml and the DIH configuration
 file with us, rather than snippets: Use pastebin.com if they
 are large.

 Regards,
 Gora




Re: Entire query is stopwords

2013-06-05 Thread Erick Erickson
Your problem statement is fairly odd. You say
you've defined object as a stopword, but then
you want your query to return documents that
contain object. By definition stopwords are
something that is considered irrelevant for searching
and are ignored.

So why not just take object out of your stopwords
file? Perhaps a separate stopwords file for that
particular field? Or just not use stopwords at all
for that field?

Best
Erick

On Wed, Jun 5, 2013 at 3:36 PM, Vardhan Dharnidharka
vardhan1...@hotmail.com wrote:



 Hi,

 I am using the standard edismax parser and my example query is as follows:

 {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'}

 In this case, 'object' happens to be a stopword in the StopWordsFilter in my 
 datatype 'object_description'. Now, since 'object' is not indexed at all, the 
 query does not return any results. In an ideal case, I would want documents 
 containing the term 'object' to be returned.

 What is the best practice to achieve this? Index stop-words and re-query with 
 'stopwords=false'. Or can this be done without re-querying?

 Thanks,
 Vardhan



search for docs where location not present

2013-06-05 Thread kevinlieb
I have a location-type field in my schema where I store lat / lon of a
document when this data is available.  In around half of my documents this
info is not available and I just don't store anything.

I am trying to find the documents where the location is not set but nothing
is working.  
I tried q=location_field:* and get back no results
I tried q=-location_field:[* TO *] but got back an error
I even tried something like:
q=*:*fq={!geofilt sfield=location_field}pt=34.02093,-118.210755d=25000
(distance set to a very large number)
but it returned fields even if they had no location_field set.

Can anyone think of a way to do this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: data-import problem

2013-06-05 Thread bbarani
A Solr index does not need a unique key, but almost all indexes use one.

http://wiki.apache.org/solr/UniqueKey

Try the below query passing id as id instead of titleid..

document 
 entity name=title query=SELECT id, title FROM 
name/entity 
/document

A proper dataimport config will look like,

entity name=relationship_entity query=select id,name,value 
from
table
field column=id name=idSchemaFieldName/
field column=name name=nameSchemaFieldName/
field column=value name=valueSchemaFieldName /
/entity



--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj Stats encoding problem

2013-06-05 Thread Shawn Heisey

On 6/5/2013 2:11 PM, ethereal wrote:

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);


The only QueryBuilder objects I can find are in the Lucene API, so I 
have no idea what that part of your code is doing.  Here's how I would 
duplicate the query you reference in SolrJ.  The query string is broken 
apart so that the lines won't wrap awkwardly:


String url = http://localhost:8983/solr/collection1;;
SolrServer server = new HttpSolrServer(url);


String qs = eventTimestamp:
  + [2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z];
SolrQuery query = new SolrQuery();
query.setQuery(qs);
query.set(stats, true);
query.set(stats.field, numberOfBytes);
query.set(stats.facet, eventType);

QueryResponse rsp = server.query(query);


Thanks,
Shawn



Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Chris Hostetter

: filter again with the same facet. Also, when a facet has only one value, it
: doesn't make sense to show it to the user, since searching with that facet
: is just going to give the same result set again. So when facet.missing does
: not work with facet.mincount, it is a bit of a hassle for us Will work
: on handling it in our program.Thank you for the clarification

yeah .. i totally unerstand where you are coming from, i'm just not 
certain that it's clear cut that we should change the current behavior 
since: 1) it's trivial to work arround client side; b) some other users 
might be depending on the current behavior and think that conceptually it 
doesn't make sense for facet.missing to consider facet.mincount.

i should have said before but: feel free to open an issue baout this and 
propose a patch, i'm just not sure it's a slam dunk unless we make an easy 
way to configure it to continue working the current way as well.


-Hoss


Re: search for docs where location not present

2013-06-05 Thread bbarani
select?q=*-location_field:** worked for me



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068452.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search for docs where location not present

2013-06-05 Thread Jack Krupansky
Either have your update client explicitly set a boolean field that indicates 
whether location is present, or use an update processor to set an explicit 
boolean field that means no location present:


updateRequestProcessorChain name=location-present
 processor class=solr.CloneFieldUpdateProcessorFactory
   str name=sourcelocation_field/str
   str name=desthas_location_b/str
 /processor
 processor class=solr.RegexReplaceProcessorFactory
   str name=fieldNamehas_location_b/str
   str name=pattern[^\s]+/str
   str name=replacementtrue/str
 /processor
 processor class=solr.DefaultValueUpdateProcessorFactory
   str name=fieldNamehas_location_b/str
   bool name=valuefalse/bool
 /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

-- Jack Krupansky

-Original Message- 
From: kevinlieb

Sent: Wednesday, June 05, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: search for docs where location not present

I have a location-type field in my schema where I store lat / lon of a
document when this data is available.  In around half of my documents this
info is not available and I just don't store anything.

I am trying to find the documents where the location is not set but nothing
is working.
I tried q=location_field:* and get back no results
I tried q=-location_field:[* TO *] but got back an error
I even tried something like:
q=*:*fq={!geofilt sfield=location_field}pt=34.02093,-118.210755d=25000
(distance set to a very large number)
but it returned fields even if they had no location_field set.

Can anyone think of a way to do this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: copyField generates multiple values encountered for non multiValued field

2013-06-05 Thread Chris Hostetter

: I updated the Index using SolrJ and got the exact same error message

there aren't a lot of specifics provided in this thread, so this may not 
be applicable, but if you mean you actaully using the atomic updates 
feature to update an existing document then the problem is that you still 
have the existing value in your name2 field, as well as another copy of 
the name field evaluated by copyField after the updates are applied...

http://wiki.apache.org/solr/Atomic_Updates#Stored_Values


-Hoss


Re: Indexing Heavy dataset

2013-06-05 Thread Chris Hostetter

: Furthermore, I have realized that the issue is with MySQL as its not
: processing this table when a where is applied

http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F


-Hoss


Re: search for docs where location not present

2013-06-05 Thread kevinlieb
Thanks for the replies.

I found that -location_field:* returns documents that both have and don't
have the field set.
I should clarify that I am using Solr 3.4
the location type is set to solr.LatLonType

Although I could add a boolean field that is true if location is set I'd
rather not have redundant data in the db (harkens back to my normalize sql
type days)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068459.html
Sent from the Solr - User mailing list archive at Nabble.com.


  1   2   >