Re: Custom Solr indexer/searcher

2012-11-15 Thread Mikhail Khludnev
Scott,
It sounds like you need to look into few samples of similar things in
Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to
the given in FST for query expansion. Generic query expansion is done via
MultiTermQuery. Index time terms expansion is shown in TrieField and btw
NumericRangeQuery (it should match with your goal a lot). All these are
single dimension samples, but AFAIK KD-tree is multidimensional, look into
GeoHashField which puts two dimensional points into single terms with
ability to build ranges on them see GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan  wrote:

> Scott,
>
> I probably have no idea as to what I'm saying, but if you're looking for
> finding results in a N-dimensional space, you might look at creating a
> field of type 'point'. Point-type fields have a dimension attribute; I
> believe that it can be set to a large integer value.
>
> Barring that, there is also a 'dist()' function that can be used to work
> with multiple numeric fields in order sort results based on closeness to a
> desired coordinate. The 'dist function takes a parameter to specify the
> means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
> I don't know the other options.)
>
> In the worst case, my response is worthless, but pops your question back up
> in the e-mails...
>
> Regards,
> John
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Mikhail Khludnev
Yun,

Literally you can call another QParser from the middle of a query and apply
local params to it via nested queries feature
http://searchhub.org/2009/03/31/nested-queries-in-solr/ syntax is little
bit tricky though.
But calling other QParser and attempting specify number of rows for it
makes absolutely no sense in Lucene universe. I guess you try to get
something like RDBMS or implement kind of imperative algorithm on top of
Solr. It doesn't seem possible to me. Everything here is around Boolean
Retrieval and Vector Space Model.

Regards


On Fri, Nov 16, 2012 at 10:16 AM, Dominique Debailleux <
dominique.debaill...@woana.net> wrote:

> First query is OK; it just doesn't fit your need if I understand
> Could you confirm that the expected result is 6 rows (3 rows w/ppt plus 3
> rows/pdf) ?
>
>
>
>
> 2012/11/15 jefferyyuan 
>
> > Thanks :)
> > local param is very useful, but seems it doesn't work here:
> > I tried:
> > q={!rows=3}ext_name:pdf OR ext_name:ppt  ==> this only return 3 ppt docs.
> >
> > q={!rows=3}ext_name:pdf OR q={!rows=3}ext_name:ppt
> > This causes syntax error, as solr doesn't support multiple query string.
> > q={!rows=3}ext_name:pdf OR {!rows=3}ext_name:ppt
> > This doesn't work either.
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-limit-returned-rows-directly-in-a-query-string-tp4020550p4020602.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Dominique Debailleux
> WoAnA - small.but.robust
> [image: Accèder au profil LinkedIn de Dominique
> Debailleux]
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Custom Solr indexer/searcher

2012-11-15 Thread John Whelan
Scott,

I probably have no idea as to what I'm saying, but if you're looking for
finding results in a N-dimensional space, you might look at creating a
field of type 'point'. Point-type fields have a dimension attribute; I
believe that it can be set to a large integer value.

Barring that, there is also a 'dist()' function that can be used to work
with multiple numeric fields in order sort results based on closeness to a
desired coordinate. The 'dist function takes a parameter to specify the
means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
I don't know the other options.)

In the worst case, my response is worthless, but pops your question back up
in the e-mails...

Regards,
John


Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
First query is OK; it just doesn't fit your need if I understand
Could you confirm that the expected result is 6 rows (3 rows w/ppt plus 3
rows/pdf) ?




2012/11/15 jefferyyuan 

> Thanks :)
> local param is very useful, but seems it doesn't work here:
> I tried:
> q={!rows=3}ext_name:pdf OR ext_name:ppt  ==> this only return 3 ppt docs.
>
> q={!rows=3}ext_name:pdf OR q={!rows=3}ext_name:ppt
> This causes syntax error, as solr doesn't support multiple query string.
> q={!rows=3}ext_name:pdf OR {!rows=3}ext_name:ppt
> This doesn't work either.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-limit-returned-rows-directly-in-a-query-string-tp4020550p4020602.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dominique Debailleux
WoAnA - small.but.robust
[image: Accèder au profil LinkedIn de Dominique
Debailleux]


Re: BM25 model for solr 4?

2012-11-15 Thread Floyd Wu
Thanks everyone, especially to Tom, you do give me detailed explanation
about this topic.
Of course in academic we do need to interpret result carefully, what I care
about is from end-users point of view, using BM25 will result better
ranking instead of using lucene's original VSM+Boolean model? How
significant difference will be presented?
I'd like to see some sharing from community.

Floyd


2012/11/16 Tom Burton-West 

> Hello Floyd,
>
> There is a ton of research literature out there comparing BM25 to vector
> space.  But you have to be careful interpreting it.
>
> BM25 originally beat the SMART vector space model in the early  TRECs
>  because it did better tf and length normalization.  Pivoted Document
> Length normalization was invented to get the vector space model to catch up
> to BM25.   (Just Google for Singhal length normalization.  Amith Singhal,
> now chief of Google Search did his doctoral thesis on this and it is
> available.  Similarly Stephan Robertson, now at Microsoft Research
> published a ton of studies of BM25)
>
> The default Solr/Lucene similarity class doesn't provide the length or tf
> normalization tuning params that BM25 does.  There is the sweetspot
> simliarity, but that doesn't quite work the same way that the BM25
> normalizations do.
>
> Document length normalization needs and parameter tuning all depends on
> your data.  So if you are reading a comparison, you need to determine:
> 1) When comparing recall/precision etc. between vector space and Bm25, did
> the experimenter tune both the vector space and the BM25 parameters
> 2) Are the documents (and queries) they are using in the test, similar in
>  length characteristics to your documents and
> queries.
>
> We are planning to do some testing in the next few months for our use case,
> which is 10 million books where we index the entire book.  These are
> extremely long documents compared to most IR research.
> I'd love to hear about actual (non-research) production implementations
> that have tested the new ranking models available in Solr.
>
> Tom
>
>
>
> On Wed, Nov 14, 2012 at 9:16 PM, Floyd Wu  wrote:
>
> > Hi there,
> > Does anybody can kindly tell me how to setup solr to use BM25?
> > By the way, are there any experiment or research shows BM25 and classical
> > VSM model comparison in recall/precision rate?
> >
> > Thanks in advanced.
> >
>


Re: how make a suggester?

2012-11-15 Thread Otis Gospodnetic
Hi Iwo,

This is kind of a common question.  Have a look at
http://search-lucene.com/?q=autocomplete+OR+suggester&fc_project=Solr&fc_type=mail+_hash_+userfor
lots of discussions on this topic.

In short, you could use the Suggester that comes with Solr or you could do
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/or
you could get
http://sematext.com/products/autocomplete/index.html or you could look at
http://solr.pl/en/2010/10/18/solr-and-autocomplete-part-1/ and so on

HTH,

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 15, 2012 at 11:27 AM, iwo  wrote:

> Hello,
>I would like implement a suggester with solr, which is the best way now
> in your opinion?
>
> thanks in advance
> I.
>
>
>
> -
> Complicare è facile, semplificare é difficile.
> Complicated is easy, simple is hard.
> quote: http://it.wikipedia.org/wiki/Bruno_Munari
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-make-a-suggester-tp4020540.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Patch Needed for Issue Solr-3790

2012-11-15 Thread mechravi25
Hi Koji,

Thank you for your reply..will test for the same.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Patch-Needed-for-Issue-Solr-3790-tp4019256p4020651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: cores shards and disks in SolrCloud

2012-11-15 Thread Otis Gospodnetic
Hi,

I think here you want to use a single JVM per server - no need for multiple
JVMs, JVM per Collection and such.
If you can spread data over more than 1 disk on each of your servers,
great, that will help.
Re data loss - yes, you really should just be using replication.  Sharding
a ton will minimize your data loss if you have no replication, but could
actually decrease performance.  Also, if you have lots of small shards, say
250, and only 5 servers, if 1 server dies doesn't that mean you will lose
50 shards - 20% of your data?

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Thu, Nov 15, 2012 at 8:18 PM, Buttler, David  wrote:

> The main reason to split a collection into 25 shards is to reduce the
> impact of the loss of a disk.  I was running an older version of solr, a
> disk went down, and my entire collection was offline.  Solr 4 offers
> shards.tolerant to reduce the impact of the loss of a disk: fewer documents
> will be returned.  Obviously, I could replicate the data so that I wouldn't
> lose any documents while I replace my disk, but since I am already storing
> the original data in HDFS, (with a 3x replication), adding additional
> replication for solr eats into my disk budget a bit too much.
>
> Also, my other collections have larger amounts of data / number of
> documents. For every TB of raw data, how much disk space do I want to be
> using? As little as possible.  Drives are cheap, but not free.  And, nodes
> only hold so many drives.
>
> Dave
>
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: Thursday, November 15, 2012 4:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: cores shards and disks in SolrCloud
>
> Personally I see no benefit to have more than one JVM per node, cores
> can handle it. I would say that splitting a 20m index into 25 shards
> strikes me as serious overkill, unless you expect to expand
> significantly. 20m would likely be okay with two or three shards. You
> can store the indexes for each core on different disks which can give
> ome performance benefit.
>
> Just some thoughts.
>
> Upayavira
>
>
>
> On Thu, Nov 15, 2012, at 11:04 PM, Buttler, David wrote:
> > Hi,
> > I have a question about the optimal way to distribute solr indexes across
> > a cloud.  I have a small number of collections (less than 10).  And a
> > small cluster (6 nodes), but each node has several disks - 5 of which I
> > am using for my solr indexes.  The cluster is also a hadoop cluster, so
> > the disks are not RAIDed, they are JBOD.  So, on my 5 slave nodes, each
> > with 5 disks, I was thinking of putting one shard per collection.  This
> > means I end up with 25 shards per collection.  If I had 10 collections,
> > that would make it 250 shards total.  Given that Solr 4 supports
> > multi-core, my first thought was to try one JVM for each node: for 10
> > collections per node, that means that each JVM would contain 50 shards.
> >
> > So, I set up my first collection, with a modest 20M documents, and
> > everything seems to work fine.  But, now my subsequent collections that I
> > have added are having issues.  The first one is that every time I query
> > for the document count (*:* with rows=0), a different number of documents
> > is returned. The number can differ by as much as 10%.  Now if I query
> > each shard individually (setting distrib=false), the number returned is
> > always consistent.
> >
> > I am not entirely sure this is related as I may have missed a step in my
> > setup of subsequent collections (bootstrapping the config)
> >
> > But, more related to the architecture question: is it better to have one
> > JVM per disk, one JVM per shard, or one JVM per node.  Given the MMap of
> > the indexes, how does memory play into the question?   There is a blog
> > post
> > (http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> )
> > that recommends minimizing the amount of JVM memory and maximizing the
> > amount of OS-level file cache, but how does that impact sorting /
> > boosting?
> >
> > Sorry if I have missed some documentation: I have been through the cloud
> > tutorial a couple of times, and I didn't see any discussion of these
> > issues
> >
> > Thanks,
> > Dave
>


Re: Solr 4.0 indexing performance

2012-11-15 Thread Jack Krupansky
Did you start from scratch, or did you bulk index into an existing index? 
There is some "backcompat" logic in there, which is convenient, but not 
necessarily the best performance.


-- Jack Krupansky

-Original Message- 
From: Nils Weinander

Sent: Thursday, November 15, 2012 1:29 AM
To: solr-user@lucene.apache.org
Subject: Solr 4.0 indexing performance

I have just updated from Solr 3.6 to 4.0, using defaults in solrconfig.xml
for both versions. With 4.0, bulk indexing takes about twice the time it
did in 3.6. Is this to be expected, or the result of my lack of optimization
in the configuration?

--

Nils Weinander 



Re: Solr 4.0 indexing performance

2012-11-15 Thread Otis Gospodnetic
But slower indexing with solr 4.0 sounds suspicious to me... you compared
your configs? JVM parameters?  GC? IO? CPU?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 15, 2012 5:26 AM, "Nils Weinander"  wrote:

> Ah, thanks Markus!
>
> That's a good thing. I tried disabling the transaction log, the difference
> performance is marginal. So, I'll stick with the transaction logging.
>
>
> On Thu, Nov 15, 2012 at 11:02 AM, Markus Jelsma
> wrote:
>
> > Hi - you're likely seeing a drop in performance because of durability
> > which is enabled by default via a transaction log. When disabled 4.0 is
> > iirc slightly faster than 3.x.
> >
> >
> > -Original message-
> > > From:Nils Weinander 
> > > Sent: Thu 15-Nov-2012 10:35
> > > To: solr-user@lucene.apache.org
> > > Subject: Solr 4.0 indexing performance
> > >
> > > I have just updated from Solr 3.6 to 4.0, using defaults in
> > solrconfig.xml
> > > for both versions. With 4.0, bulk indexing takes about twice the time
> it
> > > did in 3.6. Is this to be expected, or the result of my lack of
> > optimization
> > > in the configuration?
> > >
> > > --
> > > 
> > > Nils Weinander
> > >
> >
>
>
>
> --
> 
> Nils Weinander
>


Re: consistency in SolrCloud replication

2012-11-15 Thread Otis Gospodnetic
I think Bill was asking about search
I think the Q is whether the query hitting the shard where a doc was sent
for indexing would see that doc even before that doc has been copied to
replicas.

I didn't test it, but I'd think the answer would be positive because of the
xa log.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 15, 2012 11:30 AM, "Mark Miller"  wrote:

> It depends - no commit necessary for realtime get. Otherwise, yes, you
> would need to do at least a soft commit. That works the same way though -
> so if you make your update, then do a soft commit, you can be sure your
> next search will see the update on all the replicas. And with realtime get,
> of course no commit is necessary to see it.
>
> - Mark
>
> On Nov 15, 2012, at 10:40 AM, David Smiley (@MITRE.org) 
> wrote:
>
> > Mark Miller-3 wrote
> >> I'm talking about an update request. So if you make an update, when it
> >> returns, your next search will see the update, because it will be on
> >> all replicas.
> >
> > I presume this is only the case if (of course) the client also sent a
> > commit.  So you're saying the commit call will not return unless all
> > replicas have completed their commits.  Right?
> >
> > ~ David
> >
> >
> >
> > -
> > Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/consistency-in-SolrCloud-replication-tp4020379p4020518.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: zkcli issues

2012-11-15 Thread Nick Chase
Unfortunately, this doesn't seem to solve the issue; now I'm beginning 
to wonder if maybe it's because I'm on Windows.  Has anyone successfully 
run ZkCLI on Windows?


  Nick

On 11/12/2012 2:27 AM, Jeevanandam Madanagopal wrote:

Nick - Sorry, embedded links are not shown in previous email. I'm mentioning 
below.


Handy SolrCloud ZkCLI Commands 
(http://www.myjeeva.com/2012/10/solrcloud-cluster-single-collection-deployment/#handy-solrcloud-cli-commands)



Uploading Solr Configuration into ZooKeeper ensemble 
(http://www.myjeeva.com/2012/10/solrcloud-cluster-single-collection-deployment/#uploading-solrconfig-to-zookeeper)



Cheers,
Jeeva


On Nov 12, 2012, at 12:48 PM, Jeevanandam Madanagopal  wrote:


Nick -

I believe you're experiencing a difficulties with SolrCloud CLI commands for 
interacting ZooKeeper.
Please have a look on below links, it will provide you direction.
Handy SolrCloud ZkCLI Commands
Uploading Solr Configuration into ZooKeeper ensemble

Cheers,
Jeeva

On Nov 12, 2012, at 4:45 AM, Mark Miller  wrote:


On 11/11/2012 04:47 PM, Yonik Seeley wrote:

On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase  wrote:

So I'm trying to use ZkCLI without success.  I DID start and stop Solr in
non-cloud mode, so everything is extracted and it IS finding zookeeper*.jar.
However, now it's NOT finding SolrJ.


RE: cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
The main reason to split a collection into 25 shards is to reduce the impact of 
the loss of a disk.  I was running an older version of solr, a disk went down, 
and my entire collection was offline.  Solr 4 offers shards.tolerant to reduce 
the impact of the loss of a disk: fewer documents will be returned.  Obviously, 
I could replicate the data so that I wouldn't lose any documents while I 
replace my disk, but since I am already storing the original data in HDFS, 
(with a 3x replication), adding additional replication for solr eats into my 
disk budget a bit too much.

Also, my other collections have larger amounts of data / number of documents. 
For every TB of raw data, how much disk space do I want to be using? As little 
as possible.  Drives are cheap, but not free.  And, nodes only hold so many 
drives.  

Dave

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Thursday, November 15, 2012 4:37 PM
To: solr-user@lucene.apache.org
Subject: Re: cores shards and disks in SolrCloud

Personally I see no benefit to have more than one JVM per node, cores
can handle it. I would say that splitting a 20m index into 25 shards
strikes me as serious overkill, unless you expect to expand
significantly. 20m would likely be okay with two or three shards. You
can store the indexes for each core on different disks which can give
ome performance benefit.

Just some thoughts.

Upayavira



On Thu, Nov 15, 2012, at 11:04 PM, Buttler, David wrote:
> Hi,
> I have a question about the optimal way to distribute solr indexes across
> a cloud.  I have a small number of collections (less than 10).  And a
> small cluster (6 nodes), but each node has several disks - 5 of which I
> am using for my solr indexes.  The cluster is also a hadoop cluster, so
> the disks are not RAIDed, they are JBOD.  So, on my 5 slave nodes, each
> with 5 disks, I was thinking of putting one shard per collection.  This
> means I end up with 25 shards per collection.  If I had 10 collections,
> that would make it 250 shards total.  Given that Solr 4 supports
> multi-core, my first thought was to try one JVM for each node: for 10
> collections per node, that means that each JVM would contain 50 shards.
> 
> So, I set up my first collection, with a modest 20M documents, and
> everything seems to work fine.  But, now my subsequent collections that I
> have added are having issues.  The first one is that every time I query
> for the document count (*:* with rows=0), a different number of documents
> is returned. The number can differ by as much as 10%.  Now if I query
> each shard individually (setting distrib=false), the number returned is
> always consistent.
> 
> I am not entirely sure this is related as I may have missed a step in my
> setup of subsequent collections (bootstrapping the config)
> 
> But, more related to the architecture question: is it better to have one
> JVM per disk, one JVM per shard, or one JVM per node.  Given the MMap of
> the indexes, how does memory play into the question?   There is a blog
> post
> (http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
> that recommends minimizing the amount of JVM memory and maximizing the
> amount of OS-level file cache, but how does that impact sorting /
> boosting?
> 
> Sorry if I have missed some documentation: I have been through the cloud
> tutorial a couple of times, and I didn't see any discussion of these
> issues
> 
> Thanks,
> Dave


Re: High Slave CPU Intermittently After Replication

2012-11-15 Thread Upayavira
One question is, why optimise? The newer TieredMergePolicy, as I
understand it, takes away much of the need for optimising an index.

As to maxing, after a replication, your caches need warming. Watch how
often you replicate, nd check on the admin UI how long it takes to warm
caches. You may be maxing out memory by having multiple warming
searchers. 

Upayavira

On Thu, Nov 15, 2012, at 03:43 PM, richardg wrote:
> Here is our setup:
> 
> Solr 4.0
> Master replicates to three slaves after optimize
> 
> We have a problem were every so often after replication the CPU load on
> the
> Slave servers maxes out and request come to a crawl.  
> 
> We do a dataimport every 10 minutes and depending on the number of
> updates
> since the last optimize we run an update command with either
> optimize=true&maxsegements=4 or just optimize=true (more than 1500
> updates
> since last full optimize).   We had this issue more often until we put
> the
> optimize updates statements into our process.
> 
> Everything had been running great for a week or so until today after
> replication everything maxed out on all three slaves, it isn't that
> things
> get progressively worse, right after the replication the issue occurs. 
> The
> only way to recover from it is to do an optimize=true update and once it
> replicates out things return to normal where there isn't much load on the
> slaves at all.
> 
> There isn't anyway to predict this issue and so far I haven't seen
> anything
> in the logs that would offer any clues.  Any ideas?
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-Slave-CPU-Intermittently-After-Replication-tp4020520.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: cores shards and disks in SolrCloud

2012-11-15 Thread Upayavira
Personally I see no benefit to have more than one JVM per node, cores
can handle it. I would say that splitting a 20m index into 25 shards
strikes me as serious overkill, unless you expect to expand
significantly. 20m would likely be okay with two or three shards. You
can store the indexes for each core on different disks which can give
ome performance benefit.

Just some thoughts.

Upayavira



On Thu, Nov 15, 2012, at 11:04 PM, Buttler, David wrote:
> Hi,
> I have a question about the optimal way to distribute solr indexes across
> a cloud.  I have a small number of collections (less than 10).  And a
> small cluster (6 nodes), but each node has several disks - 5 of which I
> am using for my solr indexes.  The cluster is also a hadoop cluster, so
> the disks are not RAIDed, they are JBOD.  So, on my 5 slave nodes, each
> with 5 disks, I was thinking of putting one shard per collection.  This
> means I end up with 25 shards per collection.  If I had 10 collections,
> that would make it 250 shards total.  Given that Solr 4 supports
> multi-core, my first thought was to try one JVM for each node: for 10
> collections per node, that means that each JVM would contain 50 shards.
> 
> So, I set up my first collection, with a modest 20M documents, and
> everything seems to work fine.  But, now my subsequent collections that I
> have added are having issues.  The first one is that every time I query
> for the document count (*:* with rows=0), a different number of documents
> is returned. The number can differ by as much as 10%.  Now if I query
> each shard individually (setting distrib=false), the number returned is
> always consistent.
> 
> I am not entirely sure this is related as I may have missed a step in my
> setup of subsequent collections (bootstrapping the config)
> 
> But, more related to the architecture question: is it better to have one
> JVM per disk, one JVM per shard, or one JVM per node.  Given the MMap of
> the indexes, how does memory play into the question?   There is a blog
> post
> (http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
> that recommends minimizing the amount of JVM memory and maximizing the
> amount of OS-level file cache, but how does that impact sorting /
> boosting?
> 
> Sorry if I have missed some documentation: I have been through the cloud
> tutorial a couple of times, and I didn't see any discussion of these
> issues
> 
> Thanks,
> Dave


Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Sorry, you're out of luck.  SRPT could be generalized but that's a bit of
work.  The trickiest part I think would be writing a multi-dimensional
SpatialPrefixTree impl.

If the # of discrete values at each dimension is pretty small (<100? ish?),
then there is a way using term positions and span queries (I think).  In
you're example you're using 1, 2, ... up to 6.  But I figure that's to keep
your example simple and isn't reflective of your actual data.

~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType multivalued query

2012-11-15 Thread blopez
Sorry I tried to explain it too fast.

Imagine the usecase that I wrote on the first post.

A document can have more than one 6-Dimensions point. So my first approach
was:

   1
   2,2,2,2,2,2


   2
   3,3,3,3,3,3


   3
   4,4,4,4,4,4


It works fine and I don't think it gives us bad performance, but there are a
lot of redundant data (high disk space cost). That's why I thought about
multivalued fields:


   10
   2,2,2,2,2,2
   3,3,3,3,3,3
   4,4,4,4,4,4

   
The first approach to implement this was PointType. But I have the problem
that I comment in my first message, the search queries will be a 6-Dimension
point that I have to full-match with the indexed points, and as far as I
know I cannot do it with PointType. 

With SpatialRecursivePrefixTreeFieldType would be perfect if I could use
more than two dimensions.

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType multivalued query

2012-11-15 Thread blopez
Hi,

I think it's not a good idea to make Join operations between Solr cores
because of the performance (we managed a lot of data).

The point is that we want to store documents, each one with several
information sets (let's name them Points), each one identified by 6 values
(that's why I was trying to use 6-Dimensions PointType).

I'm doing this to try to improve the indexing space and time (and if
possible the retrieval time), because nowadays we have it implemented in
another index structure with these point values represented in a individual
Solr attribute. This way (showed below) I think is less efficient than what
I was trying to do with PointType:





...



So for the "docToReference"=1 we may have thousands of "point sets", what
implies having a lot of noise in the Solr index.

What do you think about that?

Thank you very much,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020606.html
Sent from the Solr - User mailing list archive at Nabble.com.


cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
Hi,
I have a question about the optimal way to distribute solr indexes across a 
cloud.  I have a small number of collections (less than 10).  And a small 
cluster (6 nodes), but each node has several disks - 5 of which I am using for 
my solr indexes.  The cluster is also a hadoop cluster, so the disks are not 
RAIDed, they are JBOD.  So, on my 5 slave nodes, each with 5 disks, I was 
thinking of putting one shard per collection.  This means I end up with 25 
shards per collection.  If I had 10 collections, that would make it 250 shards 
total.  Given that Solr 4 supports multi-core, my first thought was to try one 
JVM for each node: for 10 collections per node, that means that each JVM would 
contain 50 shards.

So, I set up my first collection, with a modest 20M documents, and everything 
seems to work fine.  But, now my subsequent collections that I have added are 
having issues.  The first one is that every time I query for the document count 
(*:* with rows=0), a different number of documents is returned. The number can 
differ by as much as 10%.  Now if I query each shard individually (setting 
distrib=false), the number returned is always consistent.

I am not entirely sure this is related as I may have missed a step in my setup 
of subsequent collections (bootstrapping the config)

But, more related to the architecture question: is it better to have one JVM 
per disk, one JVM per shard, or one JVM per node.  Given the MMap of the 
indexes, how does memory play into the question?   There is a blog post 
(http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html) that 
recommends minimizing the amount of JVM memory and maximizing the amount of 
OS-level file cache, but how does that impact sorting / boosting?

Sorry if I have missed some documentation: I have been through the cloud 
tutorial a couple of times, and I didn't see any discussion of these issues

Thanks,
Dave


Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Borja,
Umm, I'm quite confused with the use-case you present.
~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020609.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Admin Permissions

2012-11-15 Thread Michael Long
I figured out you can disable the core admin in solr.xml, but then it 
breaks the admin as apparently it relies on that.


I tried tomcat security but haven't been able to make it work

I think as this point I may just write a query/debugging app that the 
developers could use


On 11/13/2012 07:12 AM, Erick Erickson wrote:

Slap them firmly on the wrist if they do?

The Solr admin is really designed with trusted users in mind. There are no
provisions that I know of for securing some of the functions.

Your developers have access to the Solr server through the browser, right?
They can do all of that via URL, see: http://wiki.apache.org/solr/CoreAdmin,
they don't need to use the admin server at all.

So unless you're willing to put a lot of effort into it, I don't think you
really can lock it down. If you really don't trust them to not do bad
things, set up a dev environment and lock them out of your production
servers totally?

Best
Erick


On Mon, Nov 12, 2012 at 12:41 PM, Michael Long wrote:


I really like the new admin in solr 4.0, but specifically I don't want
developers to be able to unload, rename, swap, reload, optimize, or add
core.

Any ideas on how I could still give access to the rest of the admin
without giving access to these? It is very helpful for them to have access
to the Query, Analysis, etc.





Re: PointType multivalued query

2012-11-15 Thread blopez
Hi David,

thanks for your reply.

I've tested this datatype and the values are indexed fine (I'm using
6-dimensions points).

I'm trying to retrieve results and it works only with the 2 first dimensions
(X and Y), but it's not taking into account the others 4 dimensions. 

I've been reading the documentation you sent me but I cannot see an
attribute to define the number of dimensions I should use.

Do you know what's happening?

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020551.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH nested entities don't work

2012-11-15 Thread Dyer, James
Depending on how much data you're pulling back, 2 hours might be a reasonable 
amount of time.  Of course if you had it a lot faster with Endeca & Forge, I 
can understand your questioning this. Keep in mind that the way you're setting 
up, it will build each cache, 1 at a time.  I'm pretty sure Forge does them 
serially like this also unless you use complicated tricks around it.  Likewise 
for DIH, there is a way to build your caches in parallel by setting up multiple 
DIH handlers to first build your caches, then a final handler to index the 
pre-cached data.  You need DIHCacheWriter and DIHCacheProcessor from SOLR-2943.

The default for berkleyInternalCacheSize is 2% of your JVM heap.  You might get 
better performance increasing this, but then again you might find that 2% of 
heap is way plenty big enough and you should just make it smaller to conserve 
memory.  This parameter takes bytes, so use 10 for 100k, etc.

I think the file size is hardcoded to 1gb, so if you're getting 9 files, it 
means your query is pulling back more than 8gb of data. Sound right?

To get the "defaultRowPrefetch", try putting this in the  section 
under  in solrconfig.xml.  Based on a 
quick review of the code, it seems that it will only honor jdbc parameters if 
they are in "defaults".

Also keep in mind that Lucene/Solr handle updates really well and with the size 
of your data, you likely will want to use delta updates rather than re-index 
all the time.  If so, then perhaps the total time to pull back everything isn't 
going to matter quite as much?  To implement delta updates with DIH in your 
case, I'd recommend the approach outlined here: 
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ... (you 
can still use bdb-je for caches if it still makes sense depending on how big 
the deltas are)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: mroosendaal [mailto:mroosend...@yahoo.com] 
Sent: Thursday, November 15, 2012 8:52 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH nested entities don't work

Hi James,

Just gave it a go and it worked! That's the good news. The problem now is
getting it to work faster. It took over 2 hours just to index 4 views and i
need to get information from 26.

I tried adding the defaultRowPrefetch="2" as a jdbc parameter but it
does not seem to honour that. It should work because it is part of the
oracle jdbc driver but there's no mention of it in the Solr documentation.

Would it also help to increase the berkleyInternalCacheSize? For
'CATEGORIES' it creates 9 'files'.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020503.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: BM25 model for solr 4?

2012-11-15 Thread Tom Burton-West
Hello Floyd,

There is a ton of research literature out there comparing BM25 to vector
space.  But you have to be careful interpreting it.

BM25 originally beat the SMART vector space model in the early  TRECs
 because it did better tf and length normalization.  Pivoted Document
Length normalization was invented to get the vector space model to catch up
to BM25.   (Just Google for Singhal length normalization.  Amith Singhal,
now chief of Google Search did his doctoral thesis on this and it is
available.  Similarly Stephan Robertson, now at Microsoft Research
published a ton of studies of BM25)

The default Solr/Lucene similarity class doesn't provide the length or tf
normalization tuning params that BM25 does.  There is the sweetspot
simliarity, but that doesn't quite work the same way that the BM25
normalizations do.

Document length normalization needs and parameter tuning all depends on
your data.  So if you are reading a comparison, you need to determine:
1) When comparing recall/precision etc. between vector space and Bm25, did
the experimenter tune both the vector space and the BM25 parameters
2) Are the documents (and queries) they are using in the test, similar in
 length characteristics to your documents and
queries.

We are planning to do some testing in the next few months for our use case,
which is 10 million books where we index the entire book.  These are
extremely long documents compared to most IR research.
I'd love to hear about actual (non-research) production implementations
that have tested the new ranking models available in Solr.

Tom



On Wed, Nov 14, 2012 at 9:16 PM, Floyd Wu  wrote:

> Hi there,
> Does anybody can kindly tell me how to setup solr to use BM25?
> By the way, are there any experiment or research shows BM25 and classical
> VSM model comparison in recall/precision rate?
>
> Thanks in advanced.
>


Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Wasn't obvious ;).

Maybe you could try local params...something like
q={!q.op=OR%20rows=3}yourQueryHere

Hope this helps

Dom


2012/11/15 jefferyyuan 

> Thanks for the reply.
>
> I am using SolrEntityProcessor to import data from another remote solr
> server - not database, so the query here is a solr query.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-limit-returned-rows-directly-in-a-query-string-tp4020550p4020565.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dominique Debailleux
WoAnA - small.but.robust
[image: Accèder au profil LinkedIn de Dominique
Debailleux]


Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Hi yun

Not sure to understand your need...
There is no relationship between a query string and DIH.
What you want to achieve (if "fetch 1 rows" means "select 1 rows
from a table") can be done by limiting the number of rows you SQL select
will return (the syntax differs from SGBD to SGBD).

Dom


2012/11/15 jefferyyuan 

> Hi, all:
>
> Is there a way to limit returned rows directly in a query string?
> I know we can limit returned size by parameter: rows, and start, but I am
> wondering whether we can do this directly in query string , such as:
> q=(file_type:pdf, rows:10) OR (file_type:ppt, rows:10)
> We may even add start:
> q=(file_type:pdf,start:10, rows:10) OR (file_type:ppt, start:10, rows:10)
>
> Why I am wanting this function is that, when I try to use
> DataImportHandler,
> and want to test its performance, such as fetch 1000, 1 rows.
>
> Is it possible to do this in some ways using current solr?
> Do you think this function is useful?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-limit-returned-rows-directly-in-a-query-string-tp4020550.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dominique Debailleux
WoAnA - small.but.robust
[image: Accèder au profil LinkedIn de Dominique
Debailleux]


Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Oh I'm sorry, I should have read your question more clearly.  I totally
forgot that solr.PointType supports a configurable number of dimensions.  If
you need more than 2 dimensions as your example shows you do, then you'll
have to resort to indexing your spatial data in another Solr core as
non-multiValued and then use Solr 4's new join query to relate the data back
to your main query.  You probably won't be pleased with the performance if
you have a lot of data.  Eventually, I'm hopeful for Solr supporting index
level "block join" (I think it's called) which will speed this up.

What are you using this for, any way?



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Yes, my first attemp was with a List, but it didn´t work. Then I
started to try another ways such as a String[] array with no success.

Regards,

- Luis Cappa.

2012/11/15 Sami Siren 

> hi,
>
> did you try setting your values in a List, for example ArrayList it should
> work when you use that even without specifying reguest-/response writer.
>
> --
>  Sami Siren
>
>
> On Thu, Nov 15, 2012 at 4:56 PM, Luis Cappa Banda  >wrote:
>
> > Hello,
> >
> > I´ve found what It seems to be a bug
> > JIRA-SOLR4080<
> >
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13498055
> > >
> > with
> > CloudSolrServer during atomic updates via SolrJ. Thanks to Sami I
> detected
> > that the problem could be solved setting BinaryResponseParser as Parser
> and
> > BinaryRequestWriter as Request writer. That workis with HttpSolrServer,
> but
> > CloudSolrServer or LBHttpSolrServer hasn´t got any method to set them.
> >
> > Is there a way to set both parsers to a CloudSolrServer instance? Do you
> > know if maybe it could be configured insed solrconfig.xml?
> >
> > Thanks a lot.
> >
> > --
> >
> > - Luis Cappa
> >
>



-- 

- Luis Cappa


Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Sami Siren
hi,

did you try setting your values in a List, for example ArrayList it should
work when you use that even without specifying reguest-/response writer.

--
 Sami Siren


On Thu, Nov 15, 2012 at 4:56 PM, Luis Cappa Banda wrote:

> Hello,
>
> I´ve found what It seems to be a bug
> JIRA-SOLR4080<
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13498055
> >
> with
> CloudSolrServer during atomic updates via SolrJ. Thanks to Sami I detected
> that the problem could be solved setting BinaryResponseParser as Parser and
> BinaryRequestWriter as Request writer. That workis with HttpSolrServer, but
> CloudSolrServer or LBHttpSolrServer hasn´t got any method to set them.
>
> Is there a way to set both parsers to a CloudSolrServer instance? Do you
> know if maybe it could be configured insed solrconfig.xml?
>
> Thanks a lot.
>
> --
>
> - Luis Cappa
>


Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
It depends - no commit necessary for realtime get. Otherwise, yes, you would 
need to do at least a soft commit. That works the same way though - so if you 
make your update, then do a soft commit, you can be sure your next search will 
see the update on all the replicas. And with realtime get, of course no commit 
is necessary to see it.

- Mark

On Nov 15, 2012, at 10:40 AM, David Smiley (@MITRE.org)  
wrote:

> Mark Miller-3 wrote
>> I'm talking about an update request. So if you make an update, when it
>> returns, your next search will see the update, because it will be on
>> all replicas.
> 
> I presume this is only the case if (of course) the client also sent a
> commit.  So you're saying the commit call will not return unless all
> replicas have completed their commits.  Right?
> 
> ~ David
> 
> 
> 
> -
> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/consistency-in-SolrCloud-replication-tp4020379p4020518.html
> Sent from the Solr - User mailing list archive at Nabble.com.



how make a suggester?

2012-11-15 Thread iwo
Hello,
   I would like implement a suggester with solr, which is the best way now
in your opinion?

thanks in advance
I.



-
Complicare è facile, semplificare é difficile. 
Complicated is easy, simple is hard.
quote: http://it.wikipedia.org/wiki/Bruno_Munari
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-make-a-suggester-tp4020540.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: Shard resize

2012-11-15 Thread Erick Erickson
Currently you have to re-index all of your data. If you don't you'll have a
situation in which the same document (by uniqueKey) exists in two shards
and that document may show up twice in your results list.

NOTE: by "reindex all your data", you need to _delete_ all your data first.
If you just add a shard and index more data, SolrCloud will simply try to
re-index each doc in the (new) "proper" shard. The fact that it already
exists on another shard won't be automatically handled.

There is currently work under consideration to allow shards to be split,
which would solve the reindex everything problem, but it's not in the code
yet. And it's also not an easy problem.

Best
Erick


On Thu, Nov 15, 2012 at 5:14 AM, ku3ia  wrote:

> Any ideas?
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Shard-resize-tp4020282p4020449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Nested Join Queries

2012-11-15 Thread Erick Erickson
Gerald:
Here's the place to start: http://wiki.apache.org/solr/HowToContribute

But the basic setup is
1> create a JIRA login (anyone can)
2> create a JIRA if one doesn't exist
3> generate the patch. From your root level (the one that contains "solr"
and "lucene" dirs) and "svn diff > SOLR-###.patch" wher e### is the Solr
JIRA from <2>
4> upload the patch to the Jira
5> prompt the JIRA occasionally if nobody picks it up ...

GIT patches work too, but I'm Git-naive

Happy Hacking!
Erick


On Wed, Nov 14, 2012 at 5:43 PM, Gerald Blanck <
gerald.bla...@barometerit.com> wrote:

> Mikhail-
>
> Let me know how to contribute a test case and I will put it on my to do
> list.
>
> When your many-to-many BlockJoin solution matures I would love to see it.
>
> Thanks.
> -Gerald
>
>
> On Tue, Nov 13, 2012 at 11:52 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Gerald,
> > Nice to hear the the your problem is solved. Can you contribute a test
> > case to reproduce this issue?
> >
> > FWIW, my team successfully deals with Many-to-Many in BlockJoin. It
> works,
> > but solution is a little bit immature yet.
> >
> >
> >
> > On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
> > gerald.bla...@barometerit.com> wrote:
> >
> >> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
> >> leverage.
> >>
> >> - We have modeled our document types as different indexes/cores.
> >> - Our relationships which we are attempting to join across are not
> >> single-parent to many-children relationships.  They are in fact many to
> >> many.
> >> - Additionally, memory usage is a concern.
> >>
> >> FYI.  After making the code change I mentioned in my original post, we
> >> have completed a full test cycle and did not experience any adverse
> impacts
> >> to the change.  And our join query functionality returns the results we
> >> wanted.  I would still be interested in hearing an explanation as to why
> >> the code is written as it is in v4.0.0.
> >>
> >> Thanks.
> >>
> >>
> >>
> >>
> >> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
> >> mkhlud...@griddynamics.com> wrote:
> >>
> >>> Please find reference materials
> >>>
> >>>
> >>>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> >>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
> >>> gerald.bla...@barometerit.com> wrote:
> >>>
>  Thank you.  I've not heard of BlockJoin.  I will look into it today.
>   Thanks.
> 
> 
>  On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>  mkhlud...@griddynamics.com> wrote:
> 
> > Replied. pls check maillist.
> >
> >
> >
> > On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> >> Gerald,
> >>
> >> I wonder if you tried to approach BlockJoin for your problem? Can
> you
> >> afford less frequent updates?
> >>
> >>
> >> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
> >> gerald.bla...@barometerit.com> wrote:
> >>
> >>> Thank you Erick for your reply.  I understand that search is not an
> >>> RDBMS.
> >>>  Yes, we do have a huge combinatorial explosion if we de-normalize
> >>> and
> >>> duplicate data.  In fact, I believe our use case is exactly what
> the
> >>> Solr
> >>> developers were trying to solve with the addition of the Join
> query.
> >>>  And
> >>> while the example I gave illustrates the problem we are solving
> with
> >>> the
> >>> Join functionality, it is simplistic in nature compared to what we
> >>> have in
> >>> actuality.
> >>>
> >>> Am still looking for an answer here if someone can shed some light.
> >>>  Thanks.
> >>>
> >>>
> >>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
> >>> erickerick...@gmail.com>wrote:
> >>>
> >>> > I'm going to go a bit sideways on you, partly because I can't
> >>> answer the
> >>> > question ...
> >>> >
> >>> > But, every time I see someone doing what looks like substituting
> >>> "core" for
> >>> > "table" and
> >>> > then trying to use Solr like a DB, I get on my soap-box and
> >>> preach..
> >>> >
> >>> > In this case, consider de-normalizing your DB so you can ask the
> >>> query in
> >>> > terms
> >>> > of search rather than joins. e.g.
> >>> >
> >>> > Make each document a combination of the author and the book, with
> >>> an
> >>> > additional
> >>> > field "author_has_written_a_bestseller". Now your query becomes a
> >>> really
> >>> > simple
> >>> > search, "author:name AND author_has_written_a_bestseller:true".
> >>> True, this
> >>> > kind
> >>> > of approach isn't as flexible as an RDBMS, but it's a _search_
> >>> rather than
> >>> > a query.
> >>> > Yes, it replicates data, but unless you have a 

Re: Run multiple instances of solr using single data directory

2012-11-15 Thread Erick Erickson
I think this is rather dangerous. How would these multiple slaves
coordinate replication? Would they all replicate at once? If only one was
configured to replicate, how would the others know to reopen serchers?

Furthermore, simply opening up more Solr instances on the same machine
isn't expanding the resources that need to be expanded, e.g. physical
memory to increase throughput.

Separate machines with separate their own disks for your slaves is _much_
safer and actually expands capacity. After all, if the CPU is pegged on
your slave machine under hight CPU load, starting up additional JVMs on
that machine won't get you any more CPU cycles.

Best
Erick


On Wed, Nov 14, 2012 at 12:23 PM, Rohit Harchandani wrote:

> ok. but what are the problems when brining up multiple instances reading
> from the same data directory?
> also how to re-open the searchers without restarting solr?
> Thanks,
> Rohit
>
>
> On Tue, Nov 13, 2012 at 11:20 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > If you have high query rate, running multiple instances of Solr on the
> same
> > server doesn't typically make sense.  I'd stop and rethink :)
> >
> > Otis
> > --
> > Solr Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Tue, Nov 13, 2012 at 5:46 PM, Rohit Harchandani  > >wrote:
> >
> > > Hi All,
> > > I am currently using solr 4.0. The application I am working on
> requires a
> > > high rate of queries per second.
> > > Currently, we have setup a single master and a single slave on a
> > production
> > > machine. We want to bring up multiple instances of solr (slaves). Are
> > there
> > > any problems, when bringing them up on different ports but using the
> same
> > > data directory? These will be only serving up queries and all the
> > indexing
> > > will take place on the master machine.
> > >
> > > Also, if i have multiple instances from the same data directory and i
> > > perform replication. Would that re-open searchers on all the instances?
> > > Thanks,
> > > Rohit
> > >
> >
>


Re: best practicies dealing with solr collections and instances

2012-11-15 Thread Erick Erickson
Well, what does "maintenance" entail? Changing schema? Rebuilding the index?

Many operations under the "maintenance" rubrik can be done with core admin
handler requests, see:
http://wiki.apache.org/solr/CoreAdmin

But if that doesn't solve your problem, then probably running in two
separate JVMs is where you'll have to go. But check out the core admin
stuff first.

Best
Erick


On Wed, Nov 14, 2012 at 5:19 AM, nutch.bu...@gmail.com <
nutch.bu...@gmail.com> wrote:

> I have various search applications that use different solr indexes.
> Solr cloud has a collections feature, which allows me to use 2 different
> indexes in one solr instance.
> The problem is that if I use this, any maintenance on each one of the
> collections that requires a restart of solr, would also impact the other
> collection.
>
> what is the best practice to deal with this issue?
> Should I just run 2 solr instances on different ports?
> Should I use the collections feature?
> Or any 3rd option?
>
> Thanks.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/best-practicies-dealing-with-solr-collections-and-instances-tp4020249.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr defining Schema structure trouble.

2012-11-15 Thread Jack Krupansky
Ah... sure, you can create a schema that has several different document 
types in it, with extra fields that are used in some but not all documents - 
books have the metadata fields but no page bodies while pages have page 
bodies but no metadata. And maybe even do a Solr join for the "block" of 
pages that are for the same book. Or, just two queries - the first to get 
the pages, grouped, and then take their book names/IDs and query the 
book-level metadata. You can also store the book-level metadata in a 
separate Solr collection.


But, having said that, you have to decide whether your content search is a 
pure content search or whether you also want to search by metadata as well. 
The searchable metadata should be present on each of the pages in addition 
to the book level. That may seem like repetition, but that's okay. The bulk 
of the storage will be the page bodies themselves.


-- Jack Krupansky

-Original Message- 
From: denl0

Sent: Thursday, November 15, 2012 5:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr defining Schema structure trouble.

Yes this is what I'm trying to do. But stuff related to the document like
language/title/...(i got way more fields) are stored many times. Each page
has a part of data that's the same is it possible to seperate that data?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4020471.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Unable to run two multicore Solr instances under Tomcat

2012-11-15 Thread Erick Erickson
Thanks for wrapping this up, it's always nice to get closure, especially
when it comes to googling ..


On Wed, Nov 14, 2012 at 5:34 AM, Adam Neal  wrote:

> Just to wrap up this one. Previously all the lib jars were located in the
> war file on our setup, this was mainly to ease deployment as it's just a
> single file. Moving the lib directory external to the war seems to have
> fixed the issue.
>
> Thanks for the pointer Erick.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tue 13/11/2012 12:05
> To: solr-user@lucene.apache.org
> Subject: Re: Unable to run two multicore Solr instances under Tomcat
>
>
> At a guess you have leftover jars from your earlier installation in your
> classpath that are being picked up. I've always found that figuring out how
> _that_ happened is...er... "interesting"...
>
> Best
> Erick
>
>
> On Mon, Nov 12, 2012 at 7:44 AM, Adam Neal  wrote:
>
> > Hi,
> >
> > I have been running two multicore Solr instances under Tomcat using a
> > nightly build of 4.0 from September 2011. This has been running fine but
> > when I try to update these instances to the release version of 4.0 I'm
> > hitting problems when the second instance starts up. If I have one
> instance
> > on the release version and one on the nightly build it also works fine.
> >
> > It's running on a Solaris 10 box using Tomcat 6.0.26 and Java 1.6.0_20
> >
> > I can run up either instance on it's own and it works fine, it's just
> when
> > starting both together so I'm pretty sure my configs aren't the issue.
> >
> > Snippet from the log is below, please note that I have had to type this
> > out so there may be some typos, hopefully not!
> >
> > Any ideas?
> >
> > Adam
> >
> >
> > 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> > INFO: Using JNDI solr.home: /conf_solr/instance2
> > 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader 
> > INFO: new SolrResourceLoader for deduced Solr Home:
> '/conf_solr/instance2/'
> > 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
> > INFO: SolrDispatchFilter.init()
> > 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> > INFO: Using JNDI solr.home /conf_solr/instance2
> > 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer$Initializer
> > initialize
> > INFO: looking for solr.xml: /conf_solr/instance2/solr.xml
> > 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer 
> > INFO: New CoreContainer 15471347
> > 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer load
> > INFO: Loading CoreContainer using Solr Home: '/conf_solr/instance2/'
> > 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader 
> > INFO: new SOlrResourceLoader for directory: '/conf_solr/instance2/'
> > 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
> > SEVERE: Could not start Solr. Check solr/home property and the logs
> > 12-Nov-2012 09:58:52 org.apache.solr.common.SolrException log
> > SEVERE: null:java.lang.ClassCastException:
> > org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast
> > to org.apache.xerces.xni.parser.XMLParserConfiguration
> > at org.apache.xerces.parsers.DOMParser.(Unknown Source)
> > at org.apache.xerces.parsers.DOMParser.(Unknown Source)
> > at org.apache.xerces.jaxp.DocumentBuilderImpl.(Unknown
> > Source)
> > at
> >
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown
> > Source)
> > at
> >
> com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.createDocument(SAX2DOM.java:324)
> > at
> >
> com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.(SAX2DOM.java:84)
> > at
> >
> com.sun.org.apache.xalan.internal.xsltc.runtime.output.TranslateOutputHandlerFactory.getSerializationHanlder(TransletOutputHandlerFactory.java:187)
> > at
> >
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(TransformerImpl.java:392)
> > at
> >
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:298)
> > at
> > org.apache.solr.core.CoreContainer.copyDoc(CoreContainer.java:551)
> > at
> org.apache.solr.core.CoreContainer.load(CoreContainer.java:381)
> > at
> org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> > at
> >
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
> > at
> >
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.ja

Re: Solr 4.0 Dismax woes (2 specifically)

2012-11-15 Thread Erick Erickson
OK, I'm going to reach a bit here.

First, you're right, (e)dismax distributes the terms across all the fields,
there's no good way around that. But for your specific example,  why don't
fielded queries work with the default query parser? e.g. q=tag:clothe
cid:95? You can use the fuzzy syntax here wherever you choose.

Second about idf. This is where I'm reaching, meaning I haven't tried it
personally. But could you do something with FunctionQueries here? You have
access to all the term info (new to 4.0), so it seems like there's some
possibility here...

At worst you could write s custom similarity class which should be
significantly simpler than a custom query parser...

FWIW,
Erick


On Tue, Nov 13, 2012 at 7:45 PM, dm_tim  wrote:

> Heck,
>
> I originally started using the default query parser but gave up on it
> because all of my search results are equally important and idf was messing
> up my results pretty badly. So I discovered the DisMax query parser which
> doesn't use idf. I was elated until I started testing. My initial results
> looked good but when I cut down the query string from "clothes" to "clot" I
> got zero results.
>
> I've been reading about how disMax is supposed to do fuzzy searches but I
> can't make it work at all.
>
> To complicate matters I discovered that my all of my search words are being
> used against all of the query fields. I had previously assumed that each
> search word would only be applied to individual query fields.
>
> So for example my q is:
> clothe 95
>
> And my qf:
> tag cid
>
> So I believe that the words "clothe" and "95" are being searched on both
> fields ("tag" and "cid") which is not what I wanted to do. I was hoping to
> have "cloth" applied only to the "tag" field and "95" applied only to the
> "cid" field.
>
> I really don't have it in me to write my own query parser so I'm hoping to
> find a way to do a fuzzy search without scores being screwed by idf. Is
> there a way to achieve my desired results with existing code?
>
> Regards,
>
> (A tired) Tim
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Dismax-woes-2-specifically-tp4020197.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


High Slave CPU Intermittently After Replication

2012-11-15 Thread richardg
Here is our setup:

Solr 4.0
Master replicates to three slaves after optimize

We have a problem were every so often after replication the CPU load on the
Slave servers maxes out and request come to a crawl.  

We do a dataimport every 10 minutes and depending on the number of updates
since the last optimize we run an update command with either
optimize=true&maxsegements=4 or just optimize=true (more than 1500 updates
since last full optimize).   We had this issue more often until we put the
optimize updates statements into our process.

Everything had been running great for a week or so until today after
replication everything maxed out on all three slaves, it isn't that things
get progressively worse, right after the replication the issue occurs.  The
only way to recover from it is to do an optimize=true update and once it
replicates out things return to normal where there isn't much load on the
slaves at all.

There isn't anyway to predict this issue and so far I haven't seen anything
in the logs that would offer any clues.  Any ideas?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-Slave-CPU-Intermittently-After-Replication-tp4020520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: consistency in SolrCloud replication

2012-11-15 Thread David Smiley (@MITRE.org)
Mark Miller-3 wrote
> I'm talking about an update request. So if you make an update, when it
> returns, your next search will see the update, because it will be on
> all replicas.

I presume this is only the case if (of course) the client also sent a
commit.  So you're saying the commit call will not return unless all
replicas have completed their commits.  Right?

~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/consistency-in-SolrCloud-replication-tp4020379p4020518.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester

On Nov 15, 2012, at 8:02 AM, Sébastien Lorber  
wrote:

>  
>
>  


I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}.  I believe 
that if you want to get parameters passed in, it looks like this:

   WHERE batchid = ${dataimporter.request.batchid}

when I kick off the DIH like this:

   $url/dih?command=full-import&entity=titles&commit=true&batchid=47

At least that's how it works for me in 3.6 and 4.0.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Sébastien Lorber
Hello,

I don't know if this is a bug or a missing feature, nor if it was corrected
in new versions of Solr (can't find any JIRA about it), so I just want to
show you the problem...

I can't test with Solr 4.0, I have a legacy system, not a lot of time, not
a Solr expert at all and it seems just updating the maven dependencies and
fixing some deprecated conf is not enough...


I have a Parameters table which contains params as KEY / VALUE
(typically it is a Spring Batch job parameters table)


My schema has a dynamic field:



---


In the DIH, when I use:

  

  

It seems the placeholder ${PARAM.KEY} is not replaced.
This leads me to a solr document where i have a field
- JOB_PARAM_=xxx (like if ${PARAM.KEY} = empty while it is never empty in
database)
Instead of having multiple fields (one per parameter):
- JOB_PARAM_p1=xxx
- JOB_PARAM_p2=yyy
- JOB_PARAM_p3=yyy


---


Curiously I wanted to test with a nested entity and by chance the
workaround works!


  


  

  

This time i get in my document what i wanted in the first place:
- JOB_PARAM_p1=xxx
- JOB_PARAM_p2=yyy
- JOB_PARAM_p3=yyy


I guess this is a bug, but perhaps it is already fixed in new versions?


Re: Solr Indexing MAX FILE LIMIT

2012-11-15 Thread Alexandre Rafalovitch
Maybe you can start by testing this with split -l and xargs :-) These are
standard Unix toolkit approaches and since you use one of them (curl) you
may be happy to use others too.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Nov 14, 2012 at 11:33 PM, mitra  wrote:

> Thank you eric
>
> I didnt know that we could write a Java class for it , can you provide me
> with some info on how to
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Hello,

I´ve found what It seems to be a bug
JIRA-SOLR4080
with
CloudSolrServer during atomic updates via SolrJ. Thanks to Sami I detected
that the problem could be solved setting BinaryResponseParser as Parser and
BinaryRequestWriter as Request writer. That workis with HttpSolrServer, but
CloudSolrServer or LBHttpSolrServer hasn´t got any method to set them.

Is there a way to set both parsers to a CloudSolrServer instance? Do you
know if maybe it could be configured insed solrconfig.xml?

Thanks a lot.

-- 

- Luis Cappa


RE: DIH nested entities don't work

2012-11-15 Thread mroosendaal
Hi James,

Just gave it a go and it worked! That's the good news. The problem now is
getting it to work faster. It took over 2 hours just to index 4 views and i
need to get information from 26.

I tried adding the defaultRowPrefetch="2" as a jdbc parameter but it
does not seem to honour that. It should work because it is part of the
oracle jdbc driver but there's no mention of it in the Solr documentation.

Would it also help to increase the berkleyInternalCacheSize? For
'CATEGORIES' it creates 9 'files'.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
I'm talking about an update request. So if you make an update, when it
returns, your next search will see the update, because it will be on
all replicas. Another process that is searching rapidly may see an
"eventually" consistent view though (very briefly). We have some ideas
to make that view "more consistent", for example by allowing for
searcher leases.

- Mark

On Thu, Nov 15, 2012 at 8:53 AM, Bill Au  wrote:
> Thanks for the info, Mark.  By "a request won't return until it's affected
> all replicas", are you referring to the update request or the query?
>
> Bill
>
>
> On Wed, Nov 14, 2012 at 7:57 PM, Mark Miller  wrote:
>
>> It's included as soon as it has been indexed - though a request won't
>> return until it's affected all replicas. Low latency eventual consistency.
>>
>> - Mark
>>
>> On Nov 14, 2012, at 5:47 PM, Bill Au  wrote:
>>
>> > Will a newly indexed document included in search result in the shard
>> leader
>> > as soon as it has been indexed locally or is it included in search result
>> > only after it has been forwarded to and indexed in all the replicas?
>> >
>> > Bill
>>
>>



-- 
- Mark


Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread David Smiley (@MITRE.org)
The particular JavaScript I referred to is this:

function processAdd(cmd) {

  doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
  
  lat = doc.getFieldValue("LATITUDE");
  lon = doc.getFieldValue("LONGITUDE");
  
  if (lat != null && lon != null)
doc.setField("latLon", lat+","+lon);

}



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-schema-xml-and-data-config-xml-tp4020376p4020492.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Error loading class solr.CJKBigramFilterFactory

2012-11-15 Thread Frederico Azeiteiro
:)
Just installed 3.6.1 and its working just fine. 

Something should be wrong with my tomcat/solr install.

Thank you Robert.

//Frederico
 


-Mensagem original-
De: Robert Muir [mailto:rcm...@gmail.com] 
Enviada: quarta-feira, 14 de Novembro de 2012 19:18
Para: solr-user@lucene.apache.org
Assunto: Re: Error loading class solr.CJKBigramFilterFactory

I'm sure. I added it to 3.6 ;)

You must have something funky with your tomcat configuration, like an exploded 
war with different versions of jars or some other form of jar hell.

On Wed, Nov 14, 2012 at 9:32 AM, Frederico Azeiteiro 
 wrote:
> Are you sure about that?
>
> We have it working on:
>
> Solr Specification Version: 3.5.0.2011.11.22.14.54.38 Solr 
> Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38 
> Lucene Specification Version: 3.5.0 Lucene Implementation Version: 
> 3.5.0 1204988 - simon - 2011-11-22 14:46:51 Current Time: Wed Nov 14 
> 17:30:07 WET 2012 Server Start Time:Wed Nov 14 11:40:36 WET 2012
>
> ??
>
> Thanks,
> Frederico
>
>
> -Mensagem original-
> De: Robert Muir [mailto:rcm...@gmail.com]
> Enviada: quarta-feira, 14 de Novembro de 2012 16:28
> Para: solr-user@lucene.apache.org
> Assunto: Re: Error loading class solr.CJKBigramFilterFactory
>
> On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro 
>  wrote:
>> Fo make some further testing I installed SOLR 3.5.0 using default 
>> Jetty server.
>>
>> When tried to start SOLR using the same schema I get:
>>
>>
>>
>> SEVERE: org.apache.solr.common.SolrException: Error loading class 
>> 'solr.CJKBigramFilterFactory'
>
> This filter was added in 3.6, so its expected that it wouldnt be found.


Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Uhm, after setting both Response and Request Writers it worked OK with *
HttpSolrServer*. I´ve tried to find a way to set BinaryResponseParser and
BinaryRequestWriter with *CloudServer *(or even via *LbHttpSolrServer*) but
I found nothing.

Suggestions? :-/

- Luis Cappa.


2012/11/15 Sami Siren 

> Try setting Request writer to binary like this:
>
>   server.setParser(new BinaryResponseParser());
>   server.setRequestWriter(new BinaryRequestWriter());
>
> Or then instead of string array use ArrayList() that contains your
> strings as the value for the map
>
>
> On Thu, Nov 15, 2012 at 3:58 PM, Luis Cappa Banda  >wrote:
>
> > Hi, Sami.
> >
> > Doing some tests I´ve used the same code as you and did a quick
> execution:
> >
> >
> > *HttpSolrServer server = new HttpSolrServer("
> > http://localhost:8080/solrserver/core1<
> > http://localhost:10080/newscover_es/items_es>
> > ");*
> >
> > * *
> > * try {*
> > * *
> > * HashMap editTags = new HashMap();*
> > * editTags.put("set", new String[]{"tag1","tag2","tag3"});*
> > * doc.addField("myField", editTags);*
> > * server.add(doc);*
> > * server.commit(true, true);*
> > * *
> > * } catch (SolrServerException e) {*
> > * e.printStackTrace();*
> > * } catch (IOException e) {*
> > * e.printStackTrace();*
> > * }*
> > *
> > *
> > *
> > *
> > And the resultant doc is wrong:
> >
> > 
> >
> > 
> >
> > 0
> >
> > 1
> >
> > 
> >
> > *:*
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 50a0f1f90cf226fb5677fe13
> >
> > 
> >
> > *[Ljava.lang.String;@7d5e90cb   ---> toString() from String[]
> > array.*
> >
> > 
> >
> > 1418710023780958208
> >
> > 
> >
> > 
> >
> > 
> >
> >
> >
> > So is definetely executing toString() method in both *HttpSolrServer
> *and *
> > CloudServer*. Tips:
> >
> >
> >- I´m using* Solrj 4.0.0 *artifact version in a *Maven *project.
> >- The fields to be updated are *dynamicFields*.
> >- I´m using Java jdk6.
> >
> > Alternatives:
> >
> >- I´m doing something wrong and I´m so stupid that I can´t see it, :-(
> >- The way I update fields is not the correct one.
> >- There is a general bug with atomic updates via SolrJ.
> >
> >
> > Regards,
> >
> >
> > - Luis Cappa.
> >
> >
> > 2012/11/15 Luis Cappa Banda 
> >
> > > I´ll have a look to Solr source code and try to fix the bug. If I
> succeed
> > > I´ll update JIRA issue with it, :-)
> > >
> > >
> > > 2012/11/15 Sami Siren 
> > >
> > >> Actually it seems that xml/binary request writers only behave
> > differently
> > >> when using array[] as the value. if I use ArrayList it also works with
> > the
> > >> xml format (4.1 branch). Still it's annoying that the two request
> > writers
> > >> behave differently so I guess it's worth adding the jira anyway.
> > >>
> > >> The Affects version should be 4.0.
> > >>
> > >>
> > >> On Thu, Nov 15, 2012 at 1:42 PM, Luis Cappa Banda <
> luisca...@gmail.com
> > >> >wrote:
> > >>
> > >> > Hello, Sami.
> > >> >
> > >> > It will be the first issue that I open so, should I create it under
> > Solr
> > >> > 4.0 version or in Solr 4.1.0 one?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > - Luis Cappa.
> > >> >
> > >> > 2012/11/15 Sami Siren 
> > >> >
> > >> > > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda <
> > >> luisca...@gmail.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Thread update:
> > >> > > >
> > >> > > > When I use a simple:
> > >> > > >
> > >> > > > *Map operation = new HashMap();*
> > >> > > >
> > >> > > >
> > >> > > > Instead of:
> > >> > > >
> > >> > > > *Map> operation = new HashMap > >> > > > List>();*
> > >> > > >
> > >> > > >
> > >> > > > The result looks better, but it´s still wrong:
> > >> > > >
> > >> > > > fieldName: [
> > >> > > > "[Value1, Value2]"
> > >> > > > ],
> > >> > > >
> > >> > > >
> > >> > > > However,  List value is received as a simple String
> > >> "[Value1,
> > >> > > > Value2]". In other words, SolrJ is internally executing a
> > toString()
> > >> > > > operation to the List. Is impossible to update
> atomically a
> > >> > > > multivalued field with a List of values in just one atomic
> update
> > >> > > > operation?
> > >> > > >
> > >> > >
> > >> > > Seems to be working fine here with HttpSolrServer /
> > >>  BinaryRequestWriter;
> > >> > >
> > >> > > HashMap editTags = new HashMap();
> > >> > > editTags.put("set", new String[]{"tag1","tag2","tag3"});
> > >> > > doc = new SolrInputDocument();
> > >> > > doc.addField("id", "unique");
> > >> > > doc.addField("tags_ss", editTags);
> > >> > > server.add(doc);
> > >> > > server.commit(true, true);
> > >> > > resp = server.query(q);
> > >> > >
> > >> >
> System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
> > >> > >
> > >> > > prints "tag1"
> > >> > >
> > >> > > ArrayList as a value works the same way as String[].
> > >> > >
> > >> > > When using xml (RequestWriter) I can see the problem that you are
> > >> > > describing, can you add a jira for that?
> > >> > >
> > >> > > --
> > >> > >  Sami SIren
>

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Try setting Request writer to binary like this:

  server.setParser(new BinaryResponseParser());
  server.setRequestWriter(new BinaryRequestWriter());

Or then instead of string array use ArrayList() that contains your
strings as the value for the map


On Thu, Nov 15, 2012 at 3:58 PM, Luis Cappa Banda wrote:

> Hi, Sami.
>
> Doing some tests I´ve used the same code as you and did a quick execution:
>
>
> *HttpSolrServer server = new HttpSolrServer("
> http://localhost:8080/solrserver/core1<
> http://localhost:10080/newscover_es/items_es>
> ");*
>
> * *
> * try {*
> * *
> * HashMap editTags = new HashMap();*
> * editTags.put("set", new String[]{"tag1","tag2","tag3"});*
> * doc.addField("myField", editTags);*
> * server.add(doc);*
> * server.commit(true, true);*
> * *
> * } catch (SolrServerException e) {*
> * e.printStackTrace();*
> * } catch (IOException e) {*
> * e.printStackTrace();*
> * }*
> *
> *
> *
> *
> And the resultant doc is wrong:
>
> 
>
> 
>
> 0
>
> 1
>
> 
>
> *:*
>
> 
>
> 
>
> 
>
> 
>
> 50a0f1f90cf226fb5677fe13
>
> 
>
> *[Ljava.lang.String;@7d5e90cb   ---> toString() from String[]
> array.*
>
> 
>
> 1418710023780958208
>
> 
>
> 
>
> 
>
>
>
> So is definetely executing toString() method in both *HttpSolrServer *and *
> CloudServer*. Tips:
>
>
>- I´m using* Solrj 4.0.0 *artifact version in a *Maven *project.
>- The fields to be updated are *dynamicFields*.
>- I´m using Java jdk6.
>
> Alternatives:
>
>- I´m doing something wrong and I´m so stupid that I can´t see it, :-(
>- The way I update fields is not the correct one.
>- There is a general bug with atomic updates via SolrJ.
>
>
> Regards,
>
>
> - Luis Cappa.
>
>
> 2012/11/15 Luis Cappa Banda 
>
> > I´ll have a look to Solr source code and try to fix the bug. If I succeed
> > I´ll update JIRA issue with it, :-)
> >
> >
> > 2012/11/15 Sami Siren 
> >
> >> Actually it seems that xml/binary request writers only behave
> differently
> >> when using array[] as the value. if I use ArrayList it also works with
> the
> >> xml format (4.1 branch). Still it's annoying that the two request
> writers
> >> behave differently so I guess it's worth adding the jira anyway.
> >>
> >> The Affects version should be 4.0.
> >>
> >>
> >> On Thu, Nov 15, 2012 at 1:42 PM, Luis Cappa Banda  >> >wrote:
> >>
> >> > Hello, Sami.
> >> >
> >> > It will be the first issue that I open so, should I create it under
> Solr
> >> > 4.0 version or in Solr 4.1.0 one?
> >> >
> >> > Thanks,
> >> >
> >> > - Luis Cappa.
> >> >
> >> > 2012/11/15 Sami Siren 
> >> >
> >> > > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda <
> >> luisca...@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > Thread update:
> >> > > >
> >> > > > When I use a simple:
> >> > > >
> >> > > > *Map operation = new HashMap();*
> >> > > >
> >> > > >
> >> > > > Instead of:
> >> > > >
> >> > > > *Map> operation = new HashMap >> > > > List>();*
> >> > > >
> >> > > >
> >> > > > The result looks better, but it´s still wrong:
> >> > > >
> >> > > > fieldName: [
> >> > > > "[Value1, Value2]"
> >> > > > ],
> >> > > >
> >> > > >
> >> > > > However,  List value is received as a simple String
> >> "[Value1,
> >> > > > Value2]". In other words, SolrJ is internally executing a
> toString()
> >> > > > operation to the List. Is impossible to update atomically a
> >> > > > multivalued field with a List of values in just one atomic update
> >> > > > operation?
> >> > > >
> >> > >
> >> > > Seems to be working fine here with HttpSolrServer /
> >>  BinaryRequestWriter;
> >> > >
> >> > > HashMap editTags = new HashMap();
> >> > > editTags.put("set", new String[]{"tag1","tag2","tag3"});
> >> > > doc = new SolrInputDocument();
> >> > > doc.addField("id", "unique");
> >> > > doc.addField("tags_ss", editTags);
> >> > > server.add(doc);
> >> > > server.commit(true, true);
> >> > > resp = server.query(q);
> >> > >
> >> > System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
> >> > >
> >> > > prints "tag1"
> >> > >
> >> > > ArrayList as a value works the same way as String[].
> >> > >
> >> > > When using xml (RequestWriter) I can see the problem that you are
> >> > > describing, can you add a jira for that?
> >> > >
> >> > > --
> >> > >  Sami SIren
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > Regards,
> >> > > >
> >> > > >
> >> > > > - Luis Cappa.
> >> > > >
> >> > > > 2012/11/15 Luis Cappa Banda 
> >> > > >
> >> > > > > Hello everyone,
> >> > > > >
> >> > > > > I´ve tested atomic updates via Ajax calls and now I´m starting
> >> with
> >> > > > atomic
> >> > > > > updates via SolrJ... but the way I´m proceeding doesn´t seem to
> >> work
> >> > > > well.
> >> > > > > Here is the snippet:
> >> > > > >
> >> > > > > *SolrInputDocument do = ne SolrInputDocument();*
> >> > > > > *doc.addField("id", "myId");*
> >> > > > > *
> >> > > > > *
> >> > > > > *Map> operation = new HashMap >> > > > > List>();*
> >> > > > > *operation.put("set", 

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hi, Sami.

Doing some tests I´ve used the same code as you and did a quick execution:


*HttpSolrServer server = new HttpSolrServer("
http://localhost:8080/solrserver/core1
");*

* *
* try {*
* *
* HashMap editTags = new HashMap();*
* editTags.put("set", new String[]{"tag1","tag2","tag3"});*
* doc.addField("myField", editTags);*
* server.add(doc);*
* server.commit(true, true);*
* *
* } catch (SolrServerException e) {*
* e.printStackTrace();*
* } catch (IOException e) {*
* e.printStackTrace();*
* }*
*
*
*
*
And the resultant doc is wrong:





0

1



*:*









50a0f1f90cf226fb5677fe13



*[Ljava.lang.String;@7d5e90cb   ---> toString() from String[]
array.*



1418710023780958208









So is definetely executing toString() method in both *HttpSolrServer *and *
CloudServer*. Tips:


   - I´m using* Solrj 4.0.0 *artifact version in a *Maven *project.
   - The fields to be updated are *dynamicFields*.
   - I´m using Java jdk6.

Alternatives:

   - I´m doing something wrong and I´m so stupid that I can´t see it, :-(
   - The way I update fields is not the correct one.
   - There is a general bug with atomic updates via SolrJ.


Regards,


- Luis Cappa.


2012/11/15 Luis Cappa Banda 

> I´ll have a look to Solr source code and try to fix the bug. If I succeed
> I´ll update JIRA issue with it, :-)
>
>
> 2012/11/15 Sami Siren 
>
>> Actually it seems that xml/binary request writers only behave differently
>> when using array[] as the value. if I use ArrayList it also works with the
>> xml format (4.1 branch). Still it's annoying that the two request writers
>> behave differently so I guess it's worth adding the jira anyway.
>>
>> The Affects version should be 4.0.
>>
>>
>> On Thu, Nov 15, 2012 at 1:42 PM, Luis Cappa Banda > >wrote:
>>
>> > Hello, Sami.
>> >
>> > It will be the first issue that I open so, should I create it under Solr
>> > 4.0 version or in Solr 4.1.0 one?
>> >
>> > Thanks,
>> >
>> > - Luis Cappa.
>> >
>> > 2012/11/15 Sami Siren 
>> >
>> > > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda <
>> luisca...@gmail.com
>> > > >wrote:
>> > >
>> > > > Thread update:
>> > > >
>> > > > When I use a simple:
>> > > >
>> > > > *Map operation = new HashMap();*
>> > > >
>> > > >
>> > > > Instead of:
>> > > >
>> > > > *Map> operation = new HashMap> > > > List>();*
>> > > >
>> > > >
>> > > > The result looks better, but it´s still wrong:
>> > > >
>> > > > fieldName: [
>> > > > "[Value1, Value2]"
>> > > > ],
>> > > >
>> > > >
>> > > > However,  List value is received as a simple String
>> "[Value1,
>> > > > Value2]". In other words, SolrJ is internally executing a toString()
>> > > > operation to the List. Is impossible to update atomically a
>> > > > multivalued field with a List of values in just one atomic update
>> > > > operation?
>> > > >
>> > >
>> > > Seems to be working fine here with HttpSolrServer /
>>  BinaryRequestWriter;
>> > >
>> > > HashMap editTags = new HashMap();
>> > > editTags.put("set", new String[]{"tag1","tag2","tag3"});
>> > > doc = new SolrInputDocument();
>> > > doc.addField("id", "unique");
>> > > doc.addField("tags_ss", editTags);
>> > > server.add(doc);
>> > > server.commit(true, true);
>> > > resp = server.query(q);
>> > >
>> > System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
>> > >
>> > > prints "tag1"
>> > >
>> > > ArrayList as a value works the same way as String[].
>> > >
>> > > When using xml (RequestWriter) I can see the problem that you are
>> > > describing, can you add a jira for that?
>> > >
>> > > --
>> > >  Sami SIren
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > Regards,
>> > > >
>> > > >
>> > > > - Luis Cappa.
>> > > >
>> > > > 2012/11/15 Luis Cappa Banda 
>> > > >
>> > > > > Hello everyone,
>> > > > >
>> > > > > I´ve tested atomic updates via Ajax calls and now I´m starting
>> with
>> > > > atomic
>> > > > > updates via SolrJ... but the way I´m proceeding doesn´t seem to
>> work
>> > > > well.
>> > > > > Here is the snippet:
>> > > > >
>> > > > > *SolrInputDocument do = ne SolrInputDocument();*
>> > > > > *doc.addField("id", "myId");*
>> > > > > *
>> > > > > *
>> > > > > *Map> operation = new HashMap> > > > > List>();*
>> > > > > *operation.put("set", [[a list of String elements]]);  // I want a
>> > set
>> > > > > operation to override field values.*
>> > > > > *doc.addField("fieldName", operation);*
>> > > > > *
>> > > > > *
>> > > > > *cloudSolrServer.add(doc); // Atomic update operation.*
>> > > > >
>> > > > >
>> > > > > And after updating the resultant doc is as follows:
>> > > > >
>> > > > > *doc: {*
>> > > > > *
>> > > > > *
>> > > > > *...*
>> > > > > *
>> > > > > *
>> > > > > *fieldName: [ "{set=values}"*
>> > > > > *],*
>> > > > > *
>> > > > > *
>> > > > > *...*
>> > > > >
>> > > > > *
>> > > > > *
>> > > > >
>> > > > > *}*
>> > > > >
>> > > > > In other words, the map which includes the "set" operation and the
>> > > field
>> > > > 

Re: consistency in SolrCloud replication

2012-11-15 Thread Bill Au
Thanks for the info, Mark.  By "a request won't return until it's affected
all replicas", are you referring to the update request or the query?

Bill


On Wed, Nov 14, 2012 at 7:57 PM, Mark Miller  wrote:

> It's included as soon as it has been indexed - though a request won't
> return until it's affected all replicas. Low latency eventual consistency.
>
> - Mark
>
> On Nov 14, 2012, at 5:47 PM, Bill Au  wrote:
>
> > Will a newly indexed document included in search result in the shard
> leader
> > as soon as it has been indexed locally or is it included in search result
> > only after it has been forwarded to and indexed in all the replicas?
> >
> > Bill
>
>


Re: Solr defining Schema structure trouble.

2012-11-15 Thread denl0
Yes this is what I'm trying to do. But stuff related to the document like
language/title/...(i got way more fields) are stored many times. Each page
has a part of data that's the same is it possible to seperate that data? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4020471.html
Sent from the Solr - User mailing list archive at Nabble.com.


PointType multivalued query

2012-11-15 Thread blopez
Hi all,

I'm using a multivalued PointType (6 dimensions) in my Solr schema. Imagine
that I have one doc indexed in Solr:


  -1
  1,1,1,1,1,1
  5,5,5,5,5,5


Now imagine that I launch some queries:

point:[0,0,0,0,0,0 TO 2,2,2,2,2,2]: Works OK (matches with the first doc
point and returns doc -1)
point:[4,4,4,4,4,4 TO 6,6,6,6,6,6]: Works OK (matches with the second doc
point and returns doc -1)

point:[4,0,0,0,0,0 TO 6,2,2,2,2,2]: Does not work. The first query point
matches with the second doc point, and the rest of query points matches with
the first doc point (returns doc -1, but it must NOT return any doc!). I
only want to retrieve docs which have a point that completely matches with
the query point.

I don't know if my problem is the PointType data type or bad behavior of the
multivalued items. What do you think about that?

Regards,
Borja.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting Question

2012-11-15 Thread Alexey Serba
Seems like pivot faceting is what you looking for (
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
)

Note: it currently does not work in distributed mode - see
https://issues.apache.org/jira/browse/SOLR-2894

On Thu, Nov 15, 2012 at 7:46 AM, Jamie Johnson  wrote:
> Sorry some more info. I have a field to store source and another for date.
>  I currently use faceting to get a temporal distribution across all
> sources.  What is the best way to get a temporal distribution per source?
>  Is the only thing I can do to execute 1 query for the list of sources and
> then another query for each source?
>
> On Wednesday, November 14, 2012, Jamie Johnson  wrote:
>> I've recently been asked to be able to display a temporal facet broken
> down by source, so source1 has the following temporal distribution, source
> 2 has the following temporal distribution etc.  I was wondering what the
> best way to accomplish this is?  My current thoughts were that I'd need to
> execute a completely separate query for each, is this right?  Could field
> aliasing some how be used to execute this in a single request to solr?  Any
> thoughts would really be appreciated.


Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
I´ll have a look to Solr source code and try to fix the bug. If I succeed
I´ll update JIRA issue with it, :-)


2012/11/15 Sami Siren 

> Actually it seems that xml/binary request writers only behave differently
> when using array[] as the value. if I use ArrayList it also works with the
> xml format (4.1 branch). Still it's annoying that the two request writers
> behave differently so I guess it's worth adding the jira anyway.
>
> The Affects version should be 4.0.
>
>
> On Thu, Nov 15, 2012 at 1:42 PM, Luis Cappa Banda  >wrote:
>
> > Hello, Sami.
> >
> > It will be the first issue that I open so, should I create it under Solr
> > 4.0 version or in Solr 4.1.0 one?
> >
> > Thanks,
> >
> > - Luis Cappa.
> >
> > 2012/11/15 Sami Siren 
> >
> > > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda <
> luisca...@gmail.com
> > > >wrote:
> > >
> > > > Thread update:
> > > >
> > > > When I use a simple:
> > > >
> > > > *Map operation = new HashMap();*
> > > >
> > > >
> > > > Instead of:
> > > >
> > > > *Map> operation = new HashMap > > > List>();*
> > > >
> > > >
> > > > The result looks better, but it´s still wrong:
> > > >
> > > > fieldName: [
> > > > "[Value1, Value2]"
> > > > ],
> > > >
> > > >
> > > > However,  List value is received as a simple String "[Value1,
> > > > Value2]". In other words, SolrJ is internally executing a toString()
> > > > operation to the List. Is impossible to update atomically a
> > > > multivalued field with a List of values in just one atomic update
> > > > operation?
> > > >
> > >
> > > Seems to be working fine here with HttpSolrServer /
>  BinaryRequestWriter;
> > >
> > > HashMap editTags = new HashMap();
> > > editTags.put("set", new String[]{"tag1","tag2","tag3"});
> > > doc = new SolrInputDocument();
> > > doc.addField("id", "unique");
> > > doc.addField("tags_ss", editTags);
> > > server.add(doc);
> > > server.commit(true, true);
> > > resp = server.query(q);
> > >
> > System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
> > >
> > > prints "tag1"
> > >
> > > ArrayList as a value works the same way as String[].
> > >
> > > When using xml (RequestWriter) I can see the problem that you are
> > > describing, can you add a jira for that?
> > >
> > > --
> > >  Sami SIren
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Regards,
> > > >
> > > >
> > > > - Luis Cappa.
> > > >
> > > > 2012/11/15 Luis Cappa Banda 
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > I´ve tested atomic updates via Ajax calls and now I´m starting with
> > > > atomic
> > > > > updates via SolrJ... but the way I´m proceeding doesn´t seem to
> work
> > > > well.
> > > > > Here is the snippet:
> > > > >
> > > > > *SolrInputDocument do = ne SolrInputDocument();*
> > > > > *doc.addField("id", "myId");*
> > > > > *
> > > > > *
> > > > > *Map> operation = new HashMap > > > > List>();*
> > > > > *operation.put("set", [[a list of String elements]]);  // I want a
> > set
> > > > > operation to override field values.*
> > > > > *doc.addField("fieldName", operation);*
> > > > > *
> > > > > *
> > > > > *cloudSolrServer.add(doc); // Atomic update operation.*
> > > > >
> > > > >
> > > > > And after updating the resultant doc is as follows:
> > > > >
> > > > > *doc: {*
> > > > > *
> > > > > *
> > > > > *...*
> > > > > *
> > > > > *
> > > > > *fieldName: [ "{set=values}"*
> > > > > *],*
> > > > > *
> > > > > *
> > > > > *...*
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > *}*
> > > > >
> > > > > In other words, the map which includes the "set" operation and the
> > > field
> > > > > values is String formatted and that String is used to update the
> > field,
> > > > :-/
> > > > >
> > > > > What is the correct way to update just one or more fields with
> SolrJ?
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > >
> > > > > - Luis Cappa
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > - Luis Cappa
> > > >
> > >
> >
> >
> >
> > --
> >
> > - Luis Cappa
> >
>



-- 

- Luis Cappa


Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Actually it seems that xml/binary request writers only behave differently
when using array[] as the value. if I use ArrayList it also works with the
xml format (4.1 branch). Still it's annoying that the two request writers
behave differently so I guess it's worth adding the jira anyway.

The Affects version should be 4.0.


On Thu, Nov 15, 2012 at 1:42 PM, Luis Cappa Banda wrote:

> Hello, Sami.
>
> It will be the first issue that I open so, should I create it under Solr
> 4.0 version or in Solr 4.1.0 one?
>
> Thanks,
>
> - Luis Cappa.
>
> 2012/11/15 Sami Siren 
>
> > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda  > >wrote:
> >
> > > Thread update:
> > >
> > > When I use a simple:
> > >
> > > *Map operation = new HashMap();*
> > >
> > >
> > > Instead of:
> > >
> > > *Map> operation = new HashMap > > List>();*
> > >
> > >
> > > The result looks better, but it´s still wrong:
> > >
> > > fieldName: [
> > > "[Value1, Value2]"
> > > ],
> > >
> > >
> > > However,  List value is received as a simple String "[Value1,
> > > Value2]". In other words, SolrJ is internally executing a toString()
> > > operation to the List. Is impossible to update atomically a
> > > multivalued field with a List of values in just one atomic update
> > > operation?
> > >
> >
> > Seems to be working fine here with HttpSolrServer /  BinaryRequestWriter;
> >
> > HashMap editTags = new HashMap();
> > editTags.put("set", new String[]{"tag1","tag2","tag3"});
> > doc = new SolrInputDocument();
> > doc.addField("id", "unique");
> > doc.addField("tags_ss", editTags);
> > server.add(doc);
> > server.commit(true, true);
> > resp = server.query(q);
> >
> System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
> >
> > prints "tag1"
> >
> > ArrayList as a value works the same way as String[].
> >
> > When using xml (RequestWriter) I can see the problem that you are
> > describing, can you add a jira for that?
> >
> > --
> >  Sami SIren
> >
> >
> >
> >
> >
> > >
> > > Regards,
> > >
> > >
> > > - Luis Cappa.
> > >
> > > 2012/11/15 Luis Cappa Banda 
> > >
> > > > Hello everyone,
> > > >
> > > > I´ve tested atomic updates via Ajax calls and now I´m starting with
> > > atomic
> > > > updates via SolrJ... but the way I´m proceeding doesn´t seem to work
> > > well.
> > > > Here is the snippet:
> > > >
> > > > *SolrInputDocument do = ne SolrInputDocument();*
> > > > *doc.addField("id", "myId");*
> > > > *
> > > > *
> > > > *Map> operation = new HashMap > > > List>();*
> > > > *operation.put("set", [[a list of String elements]]);  // I want a
> set
> > > > operation to override field values.*
> > > > *doc.addField("fieldName", operation);*
> > > > *
> > > > *
> > > > *cloudSolrServer.add(doc); // Atomic update operation.*
> > > >
> > > >
> > > > And after updating the resultant doc is as follows:
> > > >
> > > > *doc: {*
> > > > *
> > > > *
> > > > *...*
> > > > *
> > > > *
> > > > *fieldName: [ "{set=values}"*
> > > > *],*
> > > > *
> > > > *
> > > > *...*
> > > >
> > > > *
> > > > *
> > > >
> > > > *}*
> > > >
> > > > In other words, the map which includes the "set" operation and the
> > field
> > > > values is String formatted and that String is used to update the
> field,
> > > :-/
> > > >
> > > > What is the correct way to update just one or more fields with SolrJ?
> > > >
> > > >
> > > > Regards,
> > > >
> > > > --
> > > >
> > > > - Luis Cappa
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > - Luis Cappa
> > >
> >
>
>
>
> --
>
> - Luis Cappa
>


Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Ok, done:

https://issues.apache.org/jira/browse/SOLR-4080

Regards,

- Luis Cappa.


2012/11/15 Luis Cappa Banda 

> Hello, Sami.
>
> It will be the first issue that I open so, should I create it under Solr
> 4.0 version or in Solr 4.1.0 one?
>
> Thanks,
>
> - Luis Cappa.
>
>
> 2012/11/15 Sami Siren 
>
>> On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda > >wrote:
>>
>> > Thread update:
>> >
>> > When I use a simple:
>> >
>> > *Map operation = new HashMap();*
>> >
>> >
>> > Instead of:
>> >
>> > *Map> operation = new HashMap> > List>();*
>> >
>> >
>> > The result looks better, but it´s still wrong:
>> >
>> > fieldName: [
>> > "[Value1, Value2]"
>> > ],
>> >
>> >
>> > However,  List value is received as a simple String "[Value1,
>> > Value2]". In other words, SolrJ is internally executing a toString()
>> > operation to the List. Is impossible to update atomically a
>> > multivalued field with a List of values in just one atomic update
>> > operation?
>> >
>>
>> Seems to be working fine here with HttpSolrServer /  BinaryRequestWriter;
>>
>> HashMap editTags = new HashMap();
>> editTags.put("set", new String[]{"tag1","tag2","tag3"});
>> doc = new SolrInputDocument();
>> doc.addField("id", "unique");
>> doc.addField("tags_ss", editTags);
>> server.add(doc);
>> server.commit(true, true);
>> resp = server.query(q);
>> System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
>>
>> prints "tag1"
>>
>> ArrayList as a value works the same way as String[].
>>
>> When using xml (RequestWriter) I can see the problem that you are
>> describing, can you add a jira for that?
>>
>> --
>>  Sami SIren
>>
>>
>>
>>
>>
>> >
>> > Regards,
>> >
>> >
>> > - Luis Cappa.
>> >
>> > 2012/11/15 Luis Cappa Banda 
>> >
>> > > Hello everyone,
>> > >
>> > > I´ve tested atomic updates via Ajax calls and now I´m starting with
>> > atomic
>> > > updates via SolrJ... but the way I´m proceeding doesn´t seem to work
>> > well.
>> > > Here is the snippet:
>> > >
>> > > *SolrInputDocument do = ne SolrInputDocument();*
>> > > *doc.addField("id", "myId");*
>> > > *
>> > > *
>> > > *Map> operation = new HashMap> > > List>();*
>> > > *operation.put("set", [[a list of String elements]]);  // I want a set
>> > > operation to override field values.*
>> > > *doc.addField("fieldName", operation);*
>> > > *
>> > > *
>> > > *cloudSolrServer.add(doc); // Atomic update operation.*
>> > >
>> > >
>> > > And after updating the resultant doc is as follows:
>> > >
>> > > *doc: {*
>> > > *
>> > > *
>> > > *...*
>> > > *
>> > > *
>> > > *fieldName: [ "{set=values}"*
>> > > *],*
>> > > *
>> > > *
>> > > *...*
>> > >
>> > > *
>> > > *
>> > >
>> > > *}*
>> > >
>> > > In other words, the map which includes the "set" operation and the
>> field
>> > > values is String formatted and that String is used to update the
>> field,
>> > :-/
>> > >
>> > > What is the correct way to update just one or more fields with SolrJ?
>> > >
>> > >
>> > > Regards,
>> > >
>> > > --
>> > >
>> > > - Luis Cappa
>> > >
>> > >
>> >
>> >
>> > --
>> >
>> > - Luis Cappa
>> >
>>
>
>
>
> --
>
> - Luis Cappa
>
>


-- 

- Luis Cappa


Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello, Sami.

It will be the first issue that I open so, should I create it under Solr
4.0 version or in Solr 4.1.0 one?

Thanks,

- Luis Cappa.

2012/11/15 Sami Siren 

> On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda  >wrote:
>
> > Thread update:
> >
> > When I use a simple:
> >
> > *Map operation = new HashMap();*
> >
> >
> > Instead of:
> >
> > *Map> operation = new HashMap > List>();*
> >
> >
> > The result looks better, but it´s still wrong:
> >
> > fieldName: [
> > "[Value1, Value2]"
> > ],
> >
> >
> > However,  List value is received as a simple String "[Value1,
> > Value2]". In other words, SolrJ is internally executing a toString()
> > operation to the List. Is impossible to update atomically a
> > multivalued field with a List of values in just one atomic update
> > operation?
> >
>
> Seems to be working fine here with HttpSolrServer /  BinaryRequestWriter;
>
> HashMap editTags = new HashMap();
> editTags.put("set", new String[]{"tag1","tag2","tag3"});
> doc = new SolrInputDocument();
> doc.addField("id", "unique");
> doc.addField("tags_ss", editTags);
> server.add(doc);
> server.commit(true, true);
> resp = server.query(q);
> System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));
>
> prints "tag1"
>
> ArrayList as a value works the same way as String[].
>
> When using xml (RequestWriter) I can see the problem that you are
> describing, can you add a jira for that?
>
> --
>  Sami SIren
>
>
>
>
>
> >
> > Regards,
> >
> >
> > - Luis Cappa.
> >
> > 2012/11/15 Luis Cappa Banda 
> >
> > > Hello everyone,
> > >
> > > I´ve tested atomic updates via Ajax calls and now I´m starting with
> > atomic
> > > updates via SolrJ... but the way I´m proceeding doesn´t seem to work
> > well.
> > > Here is the snippet:
> > >
> > > *SolrInputDocument do = ne SolrInputDocument();*
> > > *doc.addField("id", "myId");*
> > > *
> > > *
> > > *Map> operation = new HashMap > > List>();*
> > > *operation.put("set", [[a list of String elements]]);  // I want a set
> > > operation to override field values.*
> > > *doc.addField("fieldName", operation);*
> > > *
> > > *
> > > *cloudSolrServer.add(doc); // Atomic update operation.*
> > >
> > >
> > > And after updating the resultant doc is as follows:
> > >
> > > *doc: {*
> > > *
> > > *
> > > *...*
> > > *
> > > *
> > > *fieldName: [ "{set=values}"*
> > > *],*
> > > *
> > > *
> > > *...*
> > >
> > > *
> > > *
> > >
> > > *}*
> > >
> > > In other words, the map which includes the "set" operation and the
> field
> > > values is String formatted and that String is used to update the field,
> > :-/
> > >
> > > What is the correct way to update just one or more fields with SolrJ?
> > >
> > >
> > > Regards,
> > >
> > > --
> > >
> > > - Luis Cappa
> > >
> > >
> >
> >
> > --
> >
> > - Luis Cappa
> >
>



-- 

- Luis Cappa


Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda wrote:

> Thread update:
>
> When I use a simple:
>
> *Map operation = new HashMap();*
>
>
> Instead of:
>
> *Map> operation = new HashMap List>();*
>
>
> The result looks better, but it´s still wrong:
>
> fieldName: [
> "[Value1, Value2]"
> ],
>
>
> However,  List value is received as a simple String "[Value1,
> Value2]". In other words, SolrJ is internally executing a toString()
> operation to the List. Is impossible to update atomically a
> multivalued field with a List of values in just one atomic update
> operation?
>

Seems to be working fine here with HttpSolrServer /  BinaryRequestWriter;

HashMap editTags = new HashMap();
editTags.put("set", new String[]{"tag1","tag2","tag3"});
doc = new SolrInputDocument();
doc.addField("id", "unique");
doc.addField("tags_ss", editTags);
server.add(doc);
server.commit(true, true);
resp = server.query(q);
System.out.println(resp.getResults().get(0).getFirstValue("tags_ss"));

prints "tag1"

ArrayList as a value works the same way as String[].

When using xml (RequestWriter) I can see the problem that you are
describing, can you add a jira for that?

--
 Sami SIren





>
> Regards,
>
>
> - Luis Cappa.
>
> 2012/11/15 Luis Cappa Banda 
>
> > Hello everyone,
> >
> > I´ve tested atomic updates via Ajax calls and now I´m starting with
> atomic
> > updates via SolrJ... but the way I´m proceeding doesn´t seem to work
> well.
> > Here is the snippet:
> >
> > *SolrInputDocument do = ne SolrInputDocument();*
> > *doc.addField("id", "myId");*
> > *
> > *
> > *Map> operation = new HashMap > List>();*
> > *operation.put("set", [[a list of String elements]]);  // I want a set
> > operation to override field values.*
> > *doc.addField("fieldName", operation);*
> > *
> > *
> > *cloudSolrServer.add(doc); // Atomic update operation.*
> >
> >
> > And after updating the resultant doc is as follows:
> >
> > *doc: {*
> > *
> > *
> > *...*
> > *
> > *
> > *fieldName: [ "{set=values}"*
> > *],*
> > *
> > *
> > *...*
> >
> > *
> > *
> >
> > *}*
> >
> > In other words, the map which includes the "set" operation and the field
> > values is String formatted and that String is used to update the field,
> :-/
> >
> > What is the correct way to update just one or more fields with SolrJ?
> >
> >
> > Regards,
> >
> > --
> >
> > - Luis Cappa
> >
> >
>
>
> --
>
> - Luis Cappa
>


Re: Solr 4.0 indexing performance

2012-11-15 Thread Nils Weinander
Ah, thanks Markus!

That's a good thing. I tried disabling the transaction log, the difference
performance is marginal. So, I'll stick with the transaction logging.


On Thu, Nov 15, 2012 at 11:02 AM, Markus Jelsma
wrote:

> Hi - you're likely seeing a drop in performance because of durability
> which is enabled by default via a transaction log. When disabled 4.0 is
> iirc slightly faster than 3.x.
>
>
> -Original message-
> > From:Nils Weinander 
> > Sent: Thu 15-Nov-2012 10:35
> > To: solr-user@lucene.apache.org
> > Subject: Solr 4.0 indexing performance
> >
> > I have just updated from Solr 3.6 to 4.0, using defaults in
> solrconfig.xml
> > for both versions. With 4.0, bulk indexing takes about twice the time it
> > did in 3.6. Is this to be expected, or the result of my lack of
> optimization
> > in the configuration?
> >
> > --
> > 
> > Nils Weinander
> >
>



-- 

Nils Weinander


RE: Solr 4.0 indexing performance

2012-11-15 Thread Markus Jelsma
Hi - you're likely seeing a drop in performance because of durability which is 
enabled by default via a transaction log. When disabled 4.0 is iirc slightly 
faster than 3.x.
 
 
-Original message-
> From:Nils Weinander 
> Sent: Thu 15-Nov-2012 10:35
> To: solr-user@lucene.apache.org
> Subject: Solr 4.0 indexing performance
> 
> I have just updated from Solr 3.6 to 4.0, using defaults in solrconfig.xml
> for both versions. With 4.0, bulk indexing takes about twice the time it
> did in 3.6. Is this to be expected, or the result of my lack of optimization
> in the configuration?
> 
> -- 
> 
> Nils Weinander
> 


Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Thread update:

When I use a simple:

*Map operation = new HashMap();*


Instead of:

*Map> operation = new HashMap>();*


The result looks better, but it´s still wrong:

fieldName: [
"[Value1, Value2]"
],


However,  List value is received as a simple String "[Value1,
Value2]". In other words, SolrJ is internally executing a toString()
operation to the List. Is impossible to update atomically a
multivalued field with a List of values in just one atomic update
operation?

Regards,


- Luis Cappa.

2012/11/15 Luis Cappa Banda 

> Hello everyone,
>
> I´ve tested atomic updates via Ajax calls and now I´m starting with atomic
> updates via SolrJ... but the way I´m proceeding doesn´t seem to work well.
> Here is the snippet:
>
> *SolrInputDocument do = ne SolrInputDocument();*
> *doc.addField("id", "myId");*
> *
> *
> *Map> operation = new HashMap List>();*
> *operation.put("set", [[a list of String elements]]);  // I want a set
> operation to override field values.*
> *doc.addField("fieldName", operation);*
> *
> *
> *cloudSolrServer.add(doc); // Atomic update operation.*
>
>
> And after updating the resultant doc is as follows:
>
> *doc: {*
> *
> *
> *...*
> *
> *
> *fieldName: [ "{set=values}"*
> *],*
> *
> *
> *...*
>
> *
> *
>
> *}*
>
> In other words, the map which includes the "set" operation and the field
> values is String formatted and that String is used to update the field, :-/
>
> What is the correct way to update just one or more fields with SolrJ?
>
>
> Regards,
>
> --
>
> - Luis Cappa
>
>


-- 

- Luis Cappa


Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread jmlucjav
If you are using DIH, is just doing (for a mysql project I have around for
example) something like this:

 CONCAT(lat, ',',lon) as latlon





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-schema-xml-and-data-config-xml-tp4020376p4020437.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello everyone,

I´ve tested atomic updates via Ajax calls and now I´m starting with atomic
updates via SolrJ... but the way I´m proceeding doesn´t seem to work well.
Here is the snippet:

*SolrInputDocument do = ne SolrInputDocument();*
*doc.addField("id", "myId");*
*
*
*Map> operation = new HashMap>();*
*operation.put("set", [[a list of String elements]]);  // I want a set
operation to override field values.*
*doc.addField("fieldName", operation);*
*
*
*cloudSolrServer.add(doc); // Atomic update operation.*


And after updating the resultant doc is as follows:

*doc: {*
*
*
*...*
*
*
*fieldName: [ "{set=values}"*
*],*
*
*
*...*

*
*

*}*

In other words, the map which includes the "set" operation and the field
values is String formatted and that String is used to update the field, :-/

What is the correct way to update just one or more fields with SolrJ?


Regards,

-- 

- Luis Cappa