Re: any difference between using collection vs. shard in URL?

2014-11-06 Thread Ramkumar R. Aiyengar
Do keep one thing in mind though. If you are already doing the work of
figuring out the right shard leader (through solrJ or otherwise), using
that location with just the collection name might be suboptimal if there
are multiple shard leaders present in the same instance -- the collection
name just goes to *some* shard leader and not necessarily to the one where
your document is destined. If it chooses the wrong one, it will lead to a
HTTP request to itself.
On 5 Nov 2014 15:33, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

 There's no difference between the two. Even if you send updates to a shard
 url, it will still be forwarded to the right shard leader according to the
 hash of the id (assuming you're using the default compositeId router). Of
 course, if you happen to hit the right shard leader then it is just an
 internal forward and not an extra network hop.

 The advantage with using the collection name is that you can hit any
 SolrCloud node (even the ones not hosting this collection) and it will
 still work. So for a non Java client, a load balancer can be setup in front
 of the entire cluster and things will just work.

 On Wed, Nov 5, 2014 at 8:50 PM, Ian Rose ianr...@fullstory.com wrote:

  If I add some documents to a SolrCloud shard in a collection alpha, I
 can
  post them to /solr/alpha/update.  However I notice that you can also
 post
  them using the shard name, e.g. /solr/alpha_shard4_replica1/update - in
  fact this is what Solr seems to do internally (like if you send documents
  to the wrong node so Solr needs to forward them over to the leader of the
  correct shard).
 
  Assuming you *do* always post your documents to the correct shard, is
 there
  any difference between these two, performance or otherwise?
 
  Thanks!
  - Ian
 



 --
 Regards,
 Shalin Shekhar Mangar.



solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Andreas Hubold

Hi,

I'm trying to configure a different core discovery root directory in 
solr.xml with the coreRootDirectory setting as described in 
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml


I'd like to just set it to a subdirectory of solr home (a cores 
directory to avoid confusion with configsets and other directories). I 
tried


str name=coreRootDirectorycores/str

but that's interpreted relative to the current working directory. Other 
paths such as sharedLib are interpreted relative to Solr Home and I had 
expected this here too.


I do not set solr home via system property but via JNDI so I don't think 
I can use a ${solr.home}/cores or something like that? It would be nice 
solr home were available for property substitution even if set via JNDI.


Is there another way to set a path relative to solr home here?

Regards,
Andreas


RE: recovery process - node with stale data elected leader

2014-11-06 Thread francois.grollier
Hi all,

Any idea on my issue below?

Thanks
Francois

-Original Message-
From: Grollier, Francois: IT (PRG) 
Sent: Tuesday, November 04, 2014 6:19 PM
To: solr-user@lucene.apache.org
Subject: recovery process - node with stale data elected leader

Hi,

I'm running solrCloud 4.6.0 and I have a question/issue regarding the recovery 
process.

My cluster is made of 2 shards with 2 replicas each. Nodes A1 and B1 are 
leaders, A2 and B2 followers.

I start indexing docs and kill A2. I keep indexing for a while and then kill 
A1. At this point, the cluster stops serving queries as one shard is completely 
unavailable.
Then I restart A2 first, then A1. A2 gets elected leader, waits a bit for more 
replicas to be up and once it sees A1 it starts the recovery process.
My understanding of the recovery process was that at this point A2 would notice 
that A1 has a more up to date state and it would sync with A1. It seems to 
happen like this but then I get:

INFO  - 2014-11-04 11:50:43.068; org.apache.solr.cloud.RecoveryStrategy; 
Attempting to PeerSync from http://a1:8111/solr/executions/ core=executions - 
recoveringAfterStartup=false INFO  - 2014-11-04 11:50:43.069; 
org.apache.solr.update.PeerSync; PeerSync: core=executions 
url=http://a2:8211/solr START replicas=[http://a1:8111/solr/executions/] 
nUpdates=100 INFO  - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; 
PeerSync: core=executions url=http://a2:8211/solr  Received 98 versions from 
a1:8111/solr/executions/ INFO  - 2014-11-04 11:50:43.076; 
org.apache.solr.update.PeerSync; PeerSync: core=executions 
url=http://a2:8211/solr  Our versions are newer. 
ourLowThreshold=1483859630192852992 otherHigh=1483859633446584320 INFO  - 
2014-11-04 11:50:43.077; org.apache.solr.update.PeerSync; PeerSync: 
core=executions url=http://a2:8211/solr DONE. sync succeeded


And I end up with a different set of documents in each node (actually A1 has 
all the documents but A2 misses some).

Is my understanding wrong and is it a completely nonsense to start A2 before A1?

If my understanding right, what could cause the desync? (I can provide more 
logs) And is there a way to force A2 to index the missing documents? I have try 
the FORCERECOVERY command but it generates the same result as shown above.

Thanks
francois

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___
___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___


grouping finds result name=doclist numFound=0

2014-11-06 Thread Giovanni Bricconi
Sorry for the basic question

q=*:*fq=-sku:2471834fq=FiltroDispo:1fq=has_image:1rows=100fl=descCat3,IDCat3,ranking2group=truegroup.field=IDCat3group.sort=ranking2+descgroup.ngroups=true

returns some groups with no results. I'm using solr 4.8.0, the collection
has 3 shards

Am I missing some parameters?

lst name=grouped
   lst name=IDCat3
int name=matches297254/int
int name=ngroups49/int
arr name=groups
 lst
   int name=groupValue0/intresult name=doclist
numFound=0 start=0//lst
 ...
lstint name=groupValue12043/intresult name=doclist
numFound=2 start=0docint name=IDCat312043/intstr
name=descCat3SSD/strint name=ranking2498/int/doc/result/lst


Re: EarlyTerminatingCollectorException

2014-11-06 Thread Dirk Högemann
https://issues.apache.org/jira/browse/SOLR-6710

2014-11-05 21:56 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com:

 I'm wondered too, but it seems it warmups queryResultCache

 https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522
 at least this ERRORs broke nothing  see

 https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165

 anyway, here are two usability issues:
  - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable
 toString()
  - I don't think regeneration exceptions are ERRORs, they seem WARNs for me
 or even lower. also for courtesy, particularly
 EarlyTerminatingCollectorExcepions can be recognized, and even ignored,
 providing SolrIndexSearcher.java#L522

 Would you mind to raise a ticket?

 On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann dhoeg...@gmail.com wrote:

  Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
  moderate size about 10K documents to  90K documents)) produce many
  exceptions of type:
 
  014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
  org.apache.solr.search.SolrCache: Error during auto-warming of
  key:org.apache.solr.search.QueryResultKey@62340b01
  :org.apache.solr.search.EarlyTerminatingCollectorException
 
  Our relevant solrconfig is
 
updateHandler class=solr.DirectUpdateHandler2
  autoCommit
maxTime18/maxTime!-- in ms --
  /autoCommit
/updateHandler
 
query
  maxWarmingSearchers2/maxWarmingSearchers
  filterCache
class=solr.FastLRUCache
size=8192
initialSize=8192
autowarmCount=4096/
 
 !-- queryResultCache caches results of searches - ordered lists of
   document ids (DocList) based on a query, a sort, and the range
   of documents requested.  --
  queryResultCache
class=solr.FastLRUCache
size=8192
initialSize=8192
autowarmCount=4096/
 
!-- documentCache caches Lucene Document objects (the stored fields
 for
  each document).
 Since Lucene internal document ids are transient, this cache will
  not be autowarmed.  --
  documentCache
class=solr.FastLRUCache
size=8192
initialSize=8192
autowarmCount=4096/
/query
 
  What exactly does the exception mean?
  Thank you!
 
  -- Dirk --
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread Andreas Hubold

Hi,

it might be a silly question, but are you sure that a Solr core 
collection1 exists? Or does it have a different name?

At least you would get a 404 if no such core exists.

Regards,
Andreas

nbosecker wrote on 11/05/2014 09:12 PM:

Hi all,

I'm working on updating legacy Solr to 4.10.2 to use schemaless
configuration. As such, I have added this snippet to solrconfig.xml per the
docs:

schemaFactory class=ManagedIndexSchemaFactory
  bool name=mutabletrue/bool
  str name=managedSchemaResourceNamemanaged-schema/str
/schemaFactory

I see that schema.xml is renamed to schema-xml.bak and managed-schema file
is present on Solr restart.

My Solr Dashboard is accessible via:
https://myserver:9943/solr/#/

However, I still cannot access the schema via API - keep receiving 404 [The
requested resource (/solr/schema/fields) is not available] error:
https://myserver:9943/solr/collection1/schema/fields


What am I missing to access the schema API?

Much thanks!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869.html
Sent from the Solr - User mailing list archive at Nabble.com.






Delete data from stored documents

2014-11-06 Thread yriveiro
Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: grouping finds result name=doclist numFound=0

2014-11-06 Thread Timo Schmidt
Hi Giovanni,

afaik grouping is not completly working with solr cloud. You maybe could check:

https://issues.apache.org/jira/browse/SOLR-5046

In addition, documents that should be grouped, need to be in the same shard 
(You can use router.field=IDCat3 to place all of your documents with the same 
IDCat3 in the same shard).

Maybe someboy else can give some more insight's i am also interested into the 
topic.

Cheers

Timo



Von: Giovanni Bricconi [giovanni.bricc...@banzai.it]
Gesendet: Donnerstag, 6. November 2014 11:43
An: solr-user
Betreff: grouping finds result name=doclist numFound=0

Sorry for the basic question

q=*:*fq=-sku:2471834fq=FiltroDispo:1fq=has_image:1rows=100fl=descCat3,IDCat3,ranking2group=truegroup.field=IDCat3group.sort=ranking2+descgroup.ngroups=true

returns some groups with no results. I'm using solr 4.8.0, the collection
has 3 shards

Am I missing some parameters?

lst name=grouped
   lst name=IDCat3
int name=matches297254/int
int name=ngroups49/int
arr name=groups
 lst
   int name=groupValue0/intresult name=doclist
numFound=0 start=0//lst
 ...
lstint name=groupValue12043/intresult name=doclist
numFound=2 start=0docint name=IDCat312043/intstr
name=descCat3SSD/strint name=ranking2498/int/doc/result/lst


How to dynamically create Solr cores with schema

2014-11-06 Thread Andreas Hubold

Hi,

I have a use-case where Java applications need to create Solr indexes 
dynamically. Schema fields of these indexes differ and should be defined 
by the Java application upon creation.


So I'm trying to use the Core Admin API [1] to create new cores and the 
Schema API [2] to define fields. When creating a core, I have to specify 
solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema 
to start with. I thought it would be a good idea to use a named config 
sets [3] for this purpose:


curl 
'http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data'


But when I add a field to the core m1, the field actually gets added 
to the config set. Is this a bug of feature?


curl http://localhost:8082/solr/m1/schema/fields -X POST -H 
'Content-type:application/json'

  --data-binary '[{
name:foo,
type:tdate,
stored:true
}]'

All cores created from the config set myconfig will get the new field 
foo in their schema. So this obviously does not work to create cores 
with different schema.


I also tried to use the config/schema parameters of the CREATE core 
command (instead of config sets) to specify some existing 
solrconfig.xml/schema.xml. I tried relative paths here (e.g. some level 
upwards) but I could not get it to work. The documentation [1] tells me 
that relative paths are allowed. Should this work?


Next thing that would come to my mind is to use dynamic fields instead 
of a correct managed schema, but that does not sound as nice.
Or maybe I should implement a custom CoreAdminHandler which takes list 
of field definitions, if that's possible somehow...?


I don't know. What's your recommended approach?

We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or 
different with SolrCloud?


Thank you,
Andreas

[1] 
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE
[2] 
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema

[3] https://cwiki.apache.org/confluence/display/solr/Config+Sets


Re: Delete data from stored documents

2014-11-06 Thread Mikhail Khludnev
nope.

On Thu, Nov 6, 2014 at 5:19 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?

 Thanks,

 /yago



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: What's the most efficient way to sort by number of terms matched?

2014-11-06 Thread Ahmet Arslan
Hi Trey,

Not exactly the same but we did something similar with (e)dismax's mm 
parameter. By autoRelax'ing it.

In your example, 
try with mm=3 if numFound  20 then try with mm=2 etc.

Ahmet

On Thursday, November 6, 2014 8:41 AM, Trey Grainger solrt...@gmail.com wrote:



Just curious if there are some suggestions here. The use case is fairly
simple:

Given a query like  python OR solr OR hadoop, I want to sort results by
number of keywords matched first, and by relevancy separately.

I can think of ways to do this, but not efficiently. For example, I could
do:
q=python OR solr OR hadoop
  p1=python
  p2=solr
  p3=hadoop
  sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
desc, score desc

Other than the obvious downside that this requires me to pre-parse the
user's query, it's also somewhat inefficient to run the query function once
for each term in the original query since it is re-executing multiple
queries and looping through every document in the index during scoring.

Ideally, I would be able to do something like the below that could just
pull the count of unique matched terms from the main query (q parameter)
execution::
q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc.

I don't think anything like this exists, but would love some suggestions if
anyone else has solved this before.

Thanks,

-Trey


Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread ralph tice
I've had a bad enough experience with the default shard placement that I
create a collection with one shard, add the shards where I want them, then
use add/delete replica to move the first one to the right machine/port.

Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
shards are all partitioned by time so there are big performance advantages
to optimal placement across JVMs and machines.

What sort of situation do you not have trouble with default shard placement?


On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote:
 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.

 Best,
 Erick

 On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 Hello,

 I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
 each server, so that each collection can have 2 shards with replication
 factor of 2.

 I am using below command from Collections API to create collection:

 curl '
 http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config
 '

 Is there a way to ensure that for each shard, leader and replica are on a
 different server?
 This command sometimes put them on 2 nodes from the same server.


 Thanks a lot for your help,

 Isabelle


Updating an index

2014-11-06 Thread phiroc
Hello,

I have [mistakenly] created a SOLR index in which the document IDs contain URIs 
such as file:///Z:/1933/01/1933_01.png .

In a single SOLR update command, how can I:

- copy the contents of each document's id field to a new field called 'url', 
after replacing 'Z:' by 'Y:'

- make SOLR generate a new random Id for each document

Many thanks.

Philippe




Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread nbosecker
Thanks for the reply!

My Solr has 2 cores(collection1/collection2), I can access them via the Solr
dashboard with no problem.
https://myserver:9943/solr/#/collection1
https://myserver:9943/solr/#/collection2

I can also verify the solrconfig.xml for them contain the schemaless config:
https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xmlcontentType=text/xml;charset=utf-8

I'm perplexed, as the managed_schema file has been created and seems to be
active, yet the API continue to give 404. Is this the correct format to
access?
https://myserver:9943/solr/collection1/schema/fields

(I've also tried other variations, removing the collection name etc...always
404).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: create new core based on named config set using the admin page

2014-11-06 Thread Erick Erickson
Yeah, please create a JIRA. There are a couple of umbrella JIRAs that
you might want to link it to
I'm not sure it quite fits in either, if not just let it hang out there bear:

https://issues.apache.org/jira/browse/SOLR-6703
https://issues.apache.org/jira/browse/SOLR-6084

On Wed, Nov 5, 2014 at 11:57 PM, Andreas Hubold
andreas.hub...@coremedia.com wrote:
 Hi,

 Solr 4.8 introduced named config sets with
 https://issues.apache.org/jira/browse/SOLR-4478. You can create a new core
 based on a config set with the CoreAdmin API as described in
 https://cwiki.apache.org/confluence/display/solr/Config+Sets

 The Solr Admin page allows the creation of new cores as well. There's a Add
 Core button in the Core Admin tab. This will open a dialog where you can
 enter name, instanceDir, dataDir and the names of solrconfig.xml /
 schema.xml. It would be cool and consistent if one could create a core based
 on a named config set here as well.

 I'm asking because I might have overlooked something or maybe somebody is
 already working on this. But probably I should just create a JIRA issue,
 right?

 Regards,
 Andreas

 Ramzi Alqrainy wrote on 11/05/2014 08:24 PM:

 Sorry, I did not get your point, can you please elaborate more



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Andreas Hubold
 Software Architect

 tel +49.40.325587.519
 fax +49.40.325587.999
 andreas.hub...@coremedia.com

 CoreMedia AG
 content | context | conversion

 Ludwig-Erhard-Str. 18
 20459 Hamburg, Germany
 www.coremedia.com

 Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO)
 Supervisory Board: Prof. Dr. Florian Matthes (Chairman)
 Trade Register: Amtsgericht Hamburg, HR B 76277



Re: solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Erick Erickson
An oversight I think. If you create a patch, let me know and we can
get it committed.

Hmmm, not sure though, this'll change the current behavior that people might be
counting on

On Thu, Nov 6, 2014 at 1:02 AM, Andreas Hubold
andreas.hub...@coremedia.com wrote:
 Hi,

 I'm trying to configure a different core discovery root directory in
 solr.xml with the coreRootDirectory setting as described in
 https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml

 I'd like to just set it to a subdirectory of solr home (a cores directory
 to avoid confusion with configsets and other directories). I tried

 str name=coreRootDirectorycores/str

 but that's interpreted relative to the current working directory. Other
 paths such as sharedLib are interpreted relative to Solr Home and I had
 expected this here too.

 I do not set solr home via system property but via JNDI so I don't think I
 can use a ${solr.home}/cores or something like that? It would be nice solr
 home were available for property substitution even if set via JNDI.

 Is there another way to set a path relative to solr home here?

 Regards,
 Andreas


Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread Alexandre Rafalovitch
Ok, I just booted fresh solr 4.10.2, started example-schemaless and
hit http://localhost:8983/solr/collection1/schema/fields - and it
worked.

So, I suspect the problem is not with Solr but with your setup around
it. For example, is your Solr listening on port 9943 directly (and not
8983) or do you have a proxy in between. Maybe the proxy is not
configured to forward that URL.

Do you have logs? Can you see if that URL is actually being called on
Solr's side? If you see other urls (like generic admin stuff), but not
this one, then it may not be making it there.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 6 November 2014 13:27, nbosecker nbosec...@gmail.com wrote:
 Thanks for the reply!

 My Solr has 2 cores(collection1/collection2), I can access them via the Solr
 dashboard with no problem.
 https://myserver:9943/solr/#/collection1
 https://myserver:9943/solr/#/collection2

 I can also verify the solrconfig.xml for them contain the schemaless config:
 https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xmlcontentType=text/xml;charset=utf-8

 I'm perplexed, as the managed_schema file has been created and seems to be
 active, yet the API continue to give 404. Is this the correct format to
 access?
 https://myserver:9943/solr/collection1/schema/fields

 (I've also tried other variations, removing the collection name etc...always
 404).



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating an index

2014-11-06 Thread Erick Erickson
No way that I know of, re-indexing is in order.

Solr does not update in place, you have to re-add the document. Well,
AtomicUpdates work but iff all fields are stored. And it still wouldn't
be a single Solr command.

Best,
Erick

On Thu, Nov 6, 2014 at 8:20 AM,  phi...@free.fr wrote:
 Hello,

 I have [mistakenly] created a SOLR index in which the document IDs contain 
 URIs such as file:///Z:/1933/01/1933_01.png .

 In a single SOLR update command, how can I:

 - copy the contents of each document's id field to a new field called 'url', 
 after replacing 'Z:' by 'Y:'

 - make SOLR generate a new random Id for each document

 Many thanks.

 Philippe




Re: What's the most efficient way to sort by number of terms matched?

2014-11-06 Thread Sujit Pal
Hi Trey,

In an application I built few years ago, I had a component that rewrote the
input query into a Lucene BooleanQuery and we would set the
minimumNumberShouldMatch value for the query. Worked well, but lately we
are trying to move away from writing our own custom components since
maintaining them across releases becomes a bit of a chore.

So lately we simulate this behavior in the client by constructing
progressively smaller n-grams and OR'ing them then sending to Solr. For
your example, it becomes something like this:

(python AND solr AND hadoop) OR (python AND solr) OR (solr AND hadoop) OR
(python AND hadoop) OR (python) OR (solr) OR (hadoop).

-sujit


On Thu, Nov 6, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Trey,

 Not exactly the same but we did something similar with (e)dismax's mm
 parameter. By autoRelax'ing it.

 In your example,
 try with mm=3 if numFound  20 then try with mm=2 etc.

 Ahmet

 On Thursday, November 6, 2014 8:41 AM, Trey Grainger solrt...@gmail.com
 wrote:



 Just curious if there are some suggestions here. The use case is fairly
 simple:

 Given a query like  python OR solr OR hadoop, I want to sort results by
 number of keywords matched first, and by relevancy separately.

 I can think of ways to do this, but not efficiently. For example, I could
 do:
 q=python OR solr OR hadoop
   p1=python
   p2=solr
   p3=hadoop
   sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
 desc, score desc

 Other than the obvious downside that this requires me to pre-parse the
 user's query, it's also somewhat inefficient to run the query function once
 for each term in the original query since it is re-executing multiple
 queries and looping through every document in the index during scoring.

 Ideally, I would be able to do something like the below that could just
 pull the count of unique matched terms from the main query (q parameter)
 execution::
 q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc.

 I don't think anything like this exists, but would love some suggestions if
 anyone else has solved this before.

 Thanks,

 -Trey



Re: solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Shawn Heisey
On 11/6/2014 12:02 PM, Erick Erickson wrote:
 An oversight I think. If you create a patch, let me know and we can
 get it committed.

 Hmmm, not sure though, this'll change the current behavior that people might 
 be
 counting on

Relative to the solr home sounds like the best option to me.  It's what
I would expect, since most of the rest of Solr uses directories relative
to other directories that may or may not be explicitly defined.  I
haven't researched in-depth, but I think that the solr home itself is
the only thing in Solr that defaults to something relative to the
current working directory ... and that seems like a very good policy to
keep.

Thanks,
Shawn



Re: What's the most efficient way to sort by number of terms matched?

2014-11-06 Thread Mikhail Khludnev
Sadly, it seems it wasn't been done so far. It's either custom similarity
or function query.

On Thu, Nov 6, 2014 at 9:40 AM, Trey Grainger solrt...@gmail.com wrote:

 Just curious if there are some suggestions here. The use case is fairly
 simple:

 Given a query like  python OR solr OR hadoop, I want to sort results by
 number of keywords matched first, and by relevancy separately.

 I can think of ways to do this, but not efficiently. For example, I could
 do:
 q=python OR solr OR hadoop
   p1=python
   p2=solr
   p3=hadoop
   sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
 desc, score desc

 Other than the obvious downside that this requires me to pre-parse the
 user's query, it's also somewhat inefficient to run the query function once
 for each term in the original query since it is re-executing multiple
 queries and looping through every document in the index during scoring.

 Ideally, I would be able to do something like the below that could just
 pull the count of unique matched terms from the main query (q parameter)
 execution::
 q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc.

 I don't think anything like this exists, but would love some suggestions if
 anyone else has solved this before.

 Thanks,

 -Trey




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread nbosecker
I have some level of logging in Tomcat, and I can see that SolrDispatchFilter
is being invoked:
2014-11-06 17:23:19,016 [catalina-exec-3] DEBUG SolrDispatchFilter
- Closing out SolrRequest: {}

But that really isn't terribly helpful. Is there more logging that I could
invoke to get more info from the Solr side?

Some other logs from admin-type requests look like this:
2014-11-06 17:23:16,547 [catalina-exec-7] INFO  SolrDispatchFilter
- [admin] webapp=null path=/admin/info/logging
params={set=com.scitegic.web.catalog:ALLwt=json} status=0 QTime=4 
2014-11-06 17:23:16,551 [catalina-exec-7] DEBUG SolrDispatchFilter
- Closing out SolrRequest: {set=com.scitegic.web.catalog:ALLwt=json}

I don't have a proxy in between.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread CTO직속IsabellePhan
When using Collections API CREATE action, I found that sometimes default
shard placement is correct (leader and replica on different servers) and
sometimes not. So I was looking for a simple and reliable way to ensure
better placement.
It seems like I will have to do it manually for best control, as
recommended by Erick and you.

Thanks,

Isabelle


PS: I deleted emails from thread history, because my reply keeps being
rejected by apache server as spam...


On Thu, Nov 6, 2014 at 8:13 AM, ralph tice ralph.t...@gmail.com wrote:

 I've had a bad enough experience with the default shard placement that I
 create a collection with one shard, add the shards where I want them, then
 use add/delete replica to move the first one to the right machine/port.

 Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
 shards are all partitioned by time so there are big performance advantages
 to optimal placement across JVMs and machines.

 What sort of situation do you not have trouble with default shard
 placement?


 On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com
 wrote:
  They should be pretty well distributed by default, but if you want to
  take manual control, you can use the createNodeSet param on CREATE
  (with replication factor of 1) and then ADDREPLICA with the node param
  to put replicas for shards exactly where you want.
 
  Best,
  Erick
 




Re: Best practice to setup schemas for documents having different structures

2014-11-06 Thread Vishal Sharma
Thanks for the response guys! Appreciate it.

*Vishal Sharma** Team Lead,   Grazitti Interactive*T: +1
650­ 641 1754
E: vish...@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
http://www.linkedin.com/company/grazitti-interactive[image: Description:
Twitter] https://twitter.com/grazitti[image: fbook]
https://www.facebook.com/grazitti.interactive








On Wed, Nov 5, 2014 at 11:09 PM, Ryan Cooke r...@docurated.com wrote:

 We define all fields as wildcard fields with a suffix indicating field
 type. Then we can use something like Java annotations to map pojo variables
 to field types to append the correct suffix. This allows us to use one very
 generic schema among all of our collections and we rarely need to update
 it. Our inspiration for this method comes from the ruby library Sunspot.

 - Ryan



 ---
 Ryan Cooke
 VP of Engineering
 Docurated
 (646) 535-4595

 On Wed, Nov 5, 2014 at 9:59 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  It Depends (tm).
 
  You have a lot of options, and it all depends on your data and
  use-case. In general, there is very little cost involved when a doc
  does _not_ use a field you've defined in a schema. That is, if you
  have 100's of fields defined and only use 10, the other 90 don't take
  up space in each doc. There is some overhead with many many fields,
  but probably not so you'd notice.
 
  1 you could have a single schema that contains all your fields and
  use it amongst a bunch of indexes (cores). This is particularly easy
  in the new configset pattern.
 
  2 You could have a single schema that contains all your fields and
  use it in a single index. That index could contain all your different
  docs with, say, a type field to let you search subsets easily.
 
  3 You could have a different schema for each index and put all of the
  docs in the same index.
 
  1 I don't really like at all. If you're going to have different
  indexes, I think it's far easier to maintain if there are individual
  schemas.
 
  Between, 2 and 3 it's a tossup. 2 will skew the relevance
  calculations because all the terms are in a single index. So your
  relevance calculations for students will be influenced by the terms in
  courses docs and vice-versa. That said, you may not notice as it's
  subtle.
 
  I generally prefer 3 but I've seen 2 serve as well.
 
  Best,
  Erick
 
  On Tue, Nov 4, 2014 at 9:34 PM, Vishal Sharma vish...@grazitti.com
  wrote:
   This is something I have been thinking for a long time now.
  
   What is the best practice for setting up the Schemas for documents
 having
   different fields?
  
   Should we just create one schema with lot of fields or multiple schemas
  for
   different data structures?
  
   Here is an example: I have two objects students and courses:
  
   Student:
  
  - Student Name
  - Student Registration number
  - Course Enrolled for
  
   Course:
  
  - Course ID
  - Course Name
  - Course duration
  
   What should the ideal schema setup should look like?
  
   Any guidance would is strongly appreciated.
  
  
  
   *Vishal Sharma** Team Lead,   Grazitti Interactive*T:
 +1
   650­ 641 1754
   E: vish...@grazitti.com
   www.grazitti.com [image: Description: LinkedIn]
   http://www.linkedin.com/company/grazitti-interactive[image:
  Description:
   Twitter] https://twitter.com/grazitti[image: fbook]
   https://www.facebook.com/grazitti.interactive