Re: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Mikhail Khludnev
Hello,

Have you tried jdk 6 from Oracle?

On Tue, Mar 20, 2012 at 8:41 AM, randolf.julian 
randolf.jul...@dominionenterprises.com wrote:

 I am trying to use the data import handler to update SOLR index with Oracle
 data. In the SOLR schema, a dynamic field called PHOTO_* has been defined.
 I
 created a script transformer:

  script


 and called it in a query:

   entity name=photo transformer=script:pivotPhotos
   query=select
 p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO,
 lpad(p.display_order,3,'0') SEQUENCE_NUMBER
from traderadm.photo p
where p.realm_id = '${ad.REALM_ID}'
  and p.ad_id = '${ad.AD_ID}'
order by p.display_order/

 However, whenever I run a full import, it fails with this error in the
 solr0.log file:

 Full Import
 failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
 lt;scriptgt; can be used only in java 6 or above

 Here's the output of my java version:

 $ java -version
 java version 1.6.0_0
 OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64)
 OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

 I believe we are using java 6.

 I am lost with this error and need help on why this is happening.

 Thanks.

 - Randolf


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Why my email always been rejected?

2012-03-20 Thread 怪侠
I send email to :solr-user@lucene.apache.org, but I always receive the rejected 
email. It can't send successful.

RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Juan Pablo Mora
Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript 
dataimport. You have to use the Oracle JDK.

Juampa.

De: randolf.julian [randolf.jul...@dominionenterprises.com]
Enviado el: martes, 20 de marzo de 2012 5:41
Para: solr-user@lucene.apache.org
Asunto: SOLR 3.3 DIH and Java 1.6

I am trying to use the data import handler to update SOLR index with Oracle
data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I
created a script transformer:

  script


and called it in a query:

   entity name=photo transformer=script:pivotPhotos
   query=select
p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO,
 lpad(p.display_order,3,'0') SEQUENCE_NUMBER
from traderadm.photo p
where p.realm_id = '${ad.REALM_ID}'
  and p.ad_id = '${ad.AD_ID}'
order by p.display_order/

However, whenever I run a full import, it fails with this error in the
solr0.log file:

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
lt;scriptgt; can be used only in java 6 or above

Here's the output of my java version:

$ java -version
java version 1.6.0_0
OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

I believe we are using java 6.

I am lost with this error and need help on why this is happening.

Thanks.

- Randolf


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is the SolrJ call to add collection of documents a blocking function call ?

2012-03-20 Thread Michael Kuhlmann

Hi Ramdev,

add() is a blocking call. Otherwise it had to start an own background 
thread which is not what a library like Solrj should do (how many 
threads at most? At which priority? Which thread group? How long keep 
them pooled?)


And, additionally, you might want to know whether the transmission was 
successful, or whether your guinea pig has eaten the network cable just 
in the middle of the transmission.


But it's easy to write your own background task that adds your documents 
to the Solr server. Using Java's ExecutionService class, this is done 
within two minutes.


Greetings,
Kuli

Am 19.03.2012 16:48, schrieb ramdev.wud...@thomsonreuters.com:

Hi:
I am trying to index a collection of SolrInputDocs to a Solr server. I was 
wondering if the call I make to add the documents (the 
add(CollectionSolrInputDocument)  call ) is a blocking function call ?

I would also like to know if the add call is a call that would take longer for 
a larger collection of documents


Thanks

Ramdev





Why does parameter useCompoundFile not work?

2012-03-20 Thread cheermc
Dear all,

I want to generate compound type index instead of files contain fdt,fdx etc.

I follow the suggestion to change the useCompoundFile parameter to true
(both in indexDefaults and mainIndex) in solrconfig.xml, but when i use
post.jar to post example xml file, i find the index is the same as before,
not only 3 files including 1 cfs  2 segment files.

Could anyone tell me how to generate cfs file for index? And then could
anyone tell me why the situation happens to me?

Best regards,

Thanks,
Moss

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-parameter-useCompoundFile-not-work-tp3841702p3841702.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PorterStemmer using example schema and data

2012-03-20 Thread Birkmann, Magdalena
I tried that, and it seems like recharging and rechargeable, for example,
actually do stem to the same root (recharg). So why is it not working when I'm
searching on my indexed sampledocs? The stemming works when I search for
videos and it's actually video in the document, etc., but not for
rechargeable-recharging or capability-capable, etc., even though they stem to
the same root when i check them on the Admin/analysis page. What am I
overlooking?


On March 16, 2012 at 2:17 PM Erick Erickson erickerick...@gmail.com wrote:

 What you think the results of stemming should be and what they
 actually are sometimes differ G...

 Look at the admin/analysis page, check the verbose boxes
 and try recharging rechargeable and you'll see, step by step,
 the results of each element of the analysis chain. Since
 the Porter stemmer is algorithmic, I'm betting that
 these don't stem to the same root.

 Best
 Erick

 On Thu, Mar 15, 2012 at 7:05 AM, Birkmann, Magdalena
 magdalena.birkm...@open-xchange.com wrote:
 
  Hey there,
  I've been working through the Solr Tutorial
  (http://lucene.apache.org/solr/tutorial.html), using the example schema and
  documents, just working through step by step trying everything out.
  Everything
  worked out the way it should (just using the example queries and stuff),
  except
  for the stemming (A search for features:recharging
  http://localhost:8983/solr/select/?indent=onq=features:rechargingfl=name,features
  should match Rechargeable due to stemming with the EnglishPorterFilter, but
  doesn't). I've been the using the example directory exactly the way it was
  when
  downloading it, without changing anything. Since I'm fairly new to all of
  this
  and don't quite understand yet how all of it works or should work, I don't
  really know where the problem lies or how to configure anything to make it
  work,
  so I just thought I'd ask here, since you all seem so nice :)
  Thanks a lot in advance,
  Magda


Staggering Replication start times

2012-03-20 Thread Eric Pugh
I am playing with an index that is sharded many times, between 64 and 128.  One 
thing I noticed is that with replication set to happen every 5 minutes, it 
means that each slave hits the master at the same moment asking for updates:  
:00:00, :05:00, :10:00, :15:00 etc.   Replication takes very little time, so it 
seems like I may be flooding the network with a bunch of traffic requests, and 
then goes away.

I tweaked the replication start time code to instead just start 5 minutes after 
a shard starts up, which means instead of all of the slaves hitting at the same 
moment, they are a bit staggered.   :00:00, :00:01, :00:02, :00:04 etcetera.   
Which presumably will use my network pipe more efficiently.  

Any thoughts on this?  I know it means the slaves are more likely to be 
slightly out of sync, but over a 5 minute range will get back in sync.  

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Apache Solr 3 Enterprise Search Server available from 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.













Re: To truncate or not to truncate (group.truncate vs. facet)

2012-03-20 Thread Erick Erickson
Faceting is orthogonal to grouping, so be careful what you
ask for. So adding faceting would be easy, the only reason
I suggested grouping is your requirement that your brands be
just a count of the number of distinct ones found, not the
number of matching docs.

So a really simple solution would be to forget about grouping
and just facet. Then have your application change the counts
for all the brand entries to 1.

Best
Erick

On Mon, Mar 19, 2012 at 5:23 PM, rasser r...@vertica.dk wrote:
 I see your point.

 If I understand it correct it will however mean that i need to return
 10(brands)x100(resultToShow) = 1000 docs to facilitate that all 100 results
 to show is of the same brand. Correnct?

 And tomorrow (or later) the customer will also want a facet on 5 new fields
 eg. production year. How could this be handled with the above approach?

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3840406.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: is the SolrJ call to add collection of documents a blocking function call ?

2012-03-20 Thread Erick Erickson
Also consider StreamingUpdateSolrServer if you want multiple threads
to operate from your client.

Best
Erick

On Tue, Mar 20, 2012 at 4:12 AM, Michael Kuhlmann k...@solarier.de wrote:
 Hi Ramdev,

 add() is a blocking call. Otherwise it had to start an own background thread
 which is not what a library like Solrj should do (how many threads at most?
 At which priority? Which thread group? How long keep them pooled?)

 And, additionally, you might want to know whether the transmission was
 successful, or whether your guinea pig has eaten the network cable just in
 the middle of the transmission.

 But it's easy to write your own background task that adds your documents to
 the Solr server. Using Java's ExecutionService class, this is done within
 two minutes.

 Greetings,
 Kuli

 Am 19.03.2012 16:48, schrieb ramdev.wud...@thomsonreuters.com:

 Hi:
    I am trying to index a collection of SolrInputDocs to a Solr server. I
 was wondering if the call I make to add the documents (the
 add(CollectionSolrInputDocument)  call ) is a blocking function call ?

 I would also like to know if the add call is a call that would take longer
 for a larger collection of documents


 Thanks

 Ramdev




Re: is the SolrJ call to add collection of documents a blocking function call ?

2012-03-20 Thread darul
Hmm nice feature Erik

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-the-SolrJ-call-to-add-collection-of-documents-a-blocking-function-call-tp3839387p3842232.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why my email always been rejected?

2012-03-20 Thread Travis Low
I received it...sometimes it just needs some time.

2012/3/20 怪侠 87863...@qq.com

 I send email to :solr-user@lucene.apache.org, but I always receive the
 rejected email. It can't send successful.




-- 

**

*Travis Low, Director of Development*


** t...@4centurion.com* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* http://www.centurionresearch.com

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
I'm trying to figure out how it's possible for 2 solr instances (1
which is leader 1 is replica) to be out of sync.  I've done commits to
the solr instances, forced replication but still the solr instances
have different info.  The relevant snippet from my clusterstate.json
is listed below.


\shard3\:{
  \host2:7577_solr_shard3-core2\:{
\shard\:\shard3\,
\leader\:\true\,
\state\:\active\,
\core\:\shard3-core2\,
\collection\:\collection1\,
\node_name\:\host2:7577_solr\,
\base_url\:\http://host2:7577/solr\},
  \host1:7575_solr_shard3-core1\:{
\shard\:\shard3\,
\state\:\active\,
\core\:\shard3-core1\,
\collection\:\collection1\,
\node_name\:\host1:7575_solr\,
\base_url\:\http://host1:7575/solr\}},


Where can I look to see why this is happening?


RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread randolf.julian
Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino
Engine is not installed? This is the only way that I can prove that it's not
installed and we have to have it for SOLR data importhandler script to run.

Thanks again.
- Randolf

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: querying on shards

2012-03-20 Thread Shawn Heisey

On 3/19/2012 11:55 PM, Ankita Patil wrote:

Hi,

I wanted to know whether it is feasible to query on all the shards even if
the query yields data only from a few shards n not all. Or is it better to
mention those shards explicitly from which we get the data and only query
on them.

for example :
I have 4 shards. Now I have a query which yields data only from 2 shards.
So shoud I select those 2 shards only and query on them or it is ok to
query on all the shards? Will that affect the performance in any way?


I use a sharded index, but I am not a seasoned Java/Solr/Lucene 
developer.  My clients do not use the shards parameter themselves - they 
talk to a a load balancer, which in turn talks to a special core that 
has the shards in its request handler config and has no index of its 
own.  I call it a broker, because that is what our previous search 
product (EasyAsk) called it.


As I understand things, the performance of your slowest shard, whether 
that is because of index size on that shard or the underlying hardware, 
will be a large factor in the performance of the entire index.  A 
distributed query sends an identical query to all the shards it is 
configured for.  It gathers all those results in parallel and builds a 
final result to send to the client.


You MIGHT get better performance by not including the other shards.  If 
the no results shard query returns super-fast, it probably won't 
really make any difference.  If it takes a long time to get the answer 
that there are no results, then removing them would make things go 
faster.  That requires intelligence on the client to know where the data 
is.  If the client does not know where the data is, it is safer to 
simply include all the shards.


Thanks,
Shawn



RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Dyer, James
Taking a quick look at the code, it seems this exception could have been thrown 
for four reasons:  
(see org.apache.solr.handler.dataimport.ScriptTransformer#initEngine)

1. Your JRE doesn't have class javax.script.ScriptEngineManager  (pre 1.6, 
loaded here via reflection)

2. Your JRE doesn't have any installed scripting engines.  This little program 
outputs 1 engine on my JRE with 6 aliases:
[js, rhino, JavaScript, javascript, ECMAScript, ecmascript]
-
import javax.script.ScriptEngineFactory;
import javax.script.ScriptEngineManager;

public class TestScripting
{
public static void main(String args[])
{
ScriptEngineManager sem = new ScriptEngineManager();
for(ScriptEngineFactory sef : sem.getEngineFactories())
{
System.out.println(sef.getNames());
}
}
}
-
3. You specified an unsupported scripting engine name in the language 
parameter (see http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer)

4. The script you wrote in the script tag has errors.

Unfortunately, it looks like all 4 of these things are being checked in the 
same try/catch block.  So you could have any of these problems and are getting 
a potentially misleading error message.

One way to eliminate all #1#2 is to run the test 
org.apache.solr.handler.dataimport.TestScriptTransformer on your JRE and see 
if it passes.  (see here for how:  
http://wiki.apache.org/solr/HowToContribute#Unit_Tests)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: randolf.julian [mailto:randolf.jul...@dominionenterprises.com] 
Sent: Tuesday, March 20, 2012 9:24 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR 3.3 DIH and Java 1.6

Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino
Engine is not installed? This is the only way that I can prove that it's not
installed and we have to have it for SOLR data importhandler script to run.

Thanks again.
- Randolf

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Mark Miller
Do you have the logs for this? Either around startup or when you are forcing 
replication. Logs around both would be helpful.

Also the doc counts for each shard?

On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote:

 I'm trying to figure out how it's possible for 2 solr instances (1
 which is leader 1 is replica) to be out of sync.  I've done commits to
 the solr instances, forced replication but still the solr instances
 have different info.  The relevant snippet from my clusterstate.json
 is listed below.
 
 
\shard3\:{
  \host2:7577_solr_shard3-core2\:{
\shard\:\shard3\,
\leader\:\true\,
\state\:\active\,
\core\:\shard3-core2\,
\collection\:\collection1\,
\node_name\:\host2:7577_solr\,
\base_url\:\http://host2:7577/solr\},
  \host1:7575_solr_shard3-core1\:{
\shard\:\shard3\,
\state\:\active\,
\core\:\shard3-core1\,
\collection\:\collection1\,
\node_name\:\host1:7575_solr\,
\base_url\:\http://host1:7575/solr\}},
 
 
 Where can I look to see why this is happening?

- Mark Miller
lucidimagination.com













Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
DocCounts are the same.  I am going to disable my custom component to
see if that is mucking with something but it seems to be working
properly.

After looking at the results a little closer (expanding the number of
results coming back) it seems that the same information is in both but
the order in which the items are being returned is not the same.  I'm
sorting by score when they seem to be in different orders, if I sort
by key then the results look the same.

On Tue, Mar 20, 2012 at 10:52 AM, Mark Miller markrmil...@gmail.com wrote:
 Do you have the logs for this? Either around startup or when you are forcing 
 replication. Logs around both would be helpful.

 Also the doc counts for each shard?

 On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote:

 I'm trying to figure out how it's possible for 2 solr instances (1
 which is leader 1 is replica) to be out of sync.  I've done commits to
 the solr instances, forced replication but still the solr instances
 have different info.  The relevant snippet from my clusterstate.json
 is listed below.


    \shard3\:{
      \host2:7577_solr_shard3-core2\:{
        \shard\:\shard3\,
        \leader\:\true\,
        \state\:\active\,
        \core\:\shard3-core2\,
        \collection\:\collection1\,
        \node_name\:\host2:7577_solr\,
        \base_url\:\http://host2:7577/solr\},
      \host1:7575_solr_shard3-core1\:{
        \shard\:\shard3\,
        \state\:\active\,
        \core\:\shard3-core1\,
        \collection\:\collection1\,
        \node_name\:\host1:7575_solr\,
        \base_url\:\http://host1:7575/solr\}},


 Where can I look to see why this is happening?

 - Mark Miller
 lucidimagination.com













Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
ok, with my custom component out of the picture I still have the same
issue.  Specifically, when sorting by score on a leader and replica I
am getting different doc orderings.  Is this something anyone has
seen?

On Tue, Mar 20, 2012 at 11:09 AM, Jamie Johnson jej2...@gmail.com wrote:
 DocCounts are the same.  I am going to disable my custom component to
 see if that is mucking with something but it seems to be working
 properly.

 After looking at the results a little closer (expanding the number of
 results coming back) it seems that the same information is in both but
 the order in which the items are being returned is not the same.  I'm
 sorting by score when they seem to be in different orders, if I sort
 by key then the results look the same.

 On Tue, Mar 20, 2012 at 10:52 AM, Mark Miller markrmil...@gmail.com wrote:
 Do you have the logs for this? Either around startup or when you are forcing 
 replication. Logs around both would be helpful.

 Also the doc counts for each shard?

 On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote:

 I'm trying to figure out how it's possible for 2 solr instances (1
 which is leader 1 is replica) to be out of sync.  I've done commits to
 the solr instances, forced replication but still the solr instances
 have different info.  The relevant snippet from my clusterstate.json
 is listed below.


    \shard3\:{
      \host2:7577_solr_shard3-core2\:{
        \shard\:\shard3\,
        \leader\:\true\,
        \state\:\active\,
        \core\:\shard3-core2\,
        \collection\:\collection1\,
        \node_name\:\host2:7577_solr\,
        \base_url\:\http://host2:7577/solr\},
      \host1:7575_solr_shard3-core1\:{
        \shard\:\shard3\,
        \state\:\active\,
        \core\:\shard3-core1\,
        \collection\:\collection1\,
        \node_name\:\host1:7575_solr\,
        \base_url\:\http://host1:7575/solr\}},


 Where can I look to see why this is happening?

 - Mark Miller
 lucidimagination.com













Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Yonik Seeley
On Tue, Mar 20, 2012 at 11:17 AM, Jamie Johnson jej2...@gmail.com wrote:
 ok, with my custom component out of the picture I still have the same
 issue.  Specifically, when sorting by score on a leader and replica I
 am getting different doc orderings.  Is this something anyone has
 seen?

This is certainly possible and expected - sorting tiebreakers is by
internal lucene docid, which can change (even on a single node!)
If you need lists that don't shift around due to unrelated changes,
make sure you don't have any ties!

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
HmmmOk, I don't see how it's possible for me to ensure that there
are no ties.  If a query were for *:* everything has a constant score,
if the user requested 1 page then requested the next the results on
the second page could be duplicates from what was on the first page.
I don't remember ever seeing this issue on older versions of
SolrCloud, although from what you're saying I should have.  What could
explain why I never saw this before?

Another possible fix to ensure proper ordering couldn't we always
specify a sort order which contained the key?  So for instance the
user asks for score asc, we'd make this score asc,key asc so that
results would be order by score and then by key so the results across
pages would be consistent?


On Tue, Mar 20, 2012 at 11:30 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Mar 20, 2012 at 11:17 AM, Jamie Johnson jej2...@gmail.com wrote:
 ok, with my custom component out of the picture I still have the same
 issue.  Specifically, when sorting by score on a leader and replica I
 am getting different doc orderings.  Is this something anyone has
 seen?

 This is certainly possible and expected - sorting tiebreakers is by
 internal lucene docid, which can change (even on a single node!)
 If you need lists that don't shift around due to unrelated changes,
 make sure you don't have any ties!

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10


Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Yonik Seeley
On Tue, Mar 20, 2012 at 11:39 AM, Jamie Johnson jej2...@gmail.com wrote:
 HmmmOk, I don't see how it's possible for me to ensure that there
 are no ties.  If a query were for *:* everything has a constant score,
 if the user requested 1 page then requested the next the results on
 the second page could be duplicates from what was on the first page.
 I don't remember ever seeing this issue on older versions of
 SolrCloud, although from what you're saying I should have.  What could
 explain why I never saw this before?

If you use replication only to duplicate an index (and avoid any
merges), then you will have identical docids.

 Another possible fix to ensure proper ordering couldn't we always
 specify a sort order which contained the key?  So for instance the
 user asks for score asc, we'd make this score asc,key asc so that
 results would be order by score and then by key so the results across
 pages would be consistent?

Yep.

And like I said, this is also an issue even on a single node.
docid A can be before docid B, then a segment merge can cause these to
be shuffled.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


org.apache.solr.common.SolrException: Internal Server Error

2012-03-20 Thread qingwei201314
I use the solrJ to index a pdf file.
File file = new File(1.pdf);
String urlString = constant.getUrl();
StreamingUpdateSolrServer solr = new 
StreamingUpdateSolrServer(
urlString, 1, 1);

ContentStreamUpdateRequest up = new 
ContentStreamUpdateRequest(
/update/extract);
up.addFile(file);
up.setParam(uprefix, attr_);
up.setParam(fmap.content, attr_content);
up.setParam(literal.id, file.getPath());
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, 
false, false);
solr.request(up);
solr.blockUntilFinished();

When I execute the code, I always get the
error:org.apache.solr.common.SolrException: Internal Server Error.

What's wrong? Could anyone help me?


Thanks very much.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Internal-Server-Error-tp3842862p3842862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
I believe we're using replication to only duplicate the index
(standard SolrCloud nothing special on our end) so I don't see why the
docids wouldn't be the sameam I missing something that is
happening there that I am unaware of?

On Tue, Mar 20, 2012 at 11:50 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Mar 20, 2012 at 11:39 AM, Jamie Johnson jej2...@gmail.com wrote:
 HmmmOk, I don't see how it's possible for me to ensure that there
 are no ties.  If a query were for *:* everything has a constant score,
 if the user requested 1 page then requested the next the results on
 the second page could be duplicates from what was on the first page.
 I don't remember ever seeing this issue on older versions of
 SolrCloud, although from what you're saying I should have.  What could
 explain why I never saw this before?

 If you use replication only to duplicate an index (and avoid any
 merges), then you will have identical docids.

 Another possible fix to ensure proper ordering couldn't we always
 specify a sort order which contained the key?  So for instance the
 user asks for score asc, we'd make this score asc,key asc so that
 results would be order by score and then by key so the results across
 pages would be consistent?

 Yep.

 And like I said, this is also an issue even on a single node.
 docid A can be before docid B, then a segment merge can cause these to
 be shuffled.

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10


Re: Replication with different schema

2012-03-20 Thread in.abdul
Thanks ..
i need to index data from one solr  to another solr with different analyser
..
Now i am able to do this by querying from solr which will be index into
another solr
NOTE: As the field which i need to reindex is stored so it is easy by as my
index has 31 lakh record it is taking lot of time .. (suggest me for
better performance)

Thanks and Regards,
S SYED ABDUL KATHER



On Tue, Mar 13, 2012 at 10:05 PM, Erick Erickson [via Lucene] 
ml-node+s472066n3822752...@n3.nabble.com wrote:

 Why would you want to? This seems like an
 XY problem, see:
 http://people.apache.org/~hossman/#xyproblem

 See the confFiles section here:
 http://wiki.apache.org/solr/SolrReplication
 although it mentions solrconfig.xml, it
 might work with schema.xml.

 BUT: This strikes me as really, really
 dangerous. I'm having a hard time
 thinking of a use-case that this makes sense
 for, so be very cautious. Having an index
 created with one schema and searched
 on with another is a recipe for disaster
 IMO unless you're very careful.

 Best
 Erick

 On Tue, Mar 13, 2012 at 3:40 AM, syed kather [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3822752i=0
 wrote:
  Team,
   Is it possible to do replication with different Schema  in solr ?
   If not how can i acheive this .
 
  Can any one can give an idea to do this
  advance thanks ..
 
 Thanks and Regards,
 S SYED ABDUL KATHER


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3822752.html
 To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3843068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replica and leader out of Sync somehow

2012-03-20 Thread Jamie Johnson
Thanks Yonik, I really appreciate the explanation.  It sounds like the
best solution for me to solve this is to add the additional sort
parameter.  That being said is there a significant memory increase to
do this when sorting by score?  I don't see how with SolrCloud I can
avoid doing this, and how others wouldn't need to do the same thing.

On Tue, Mar 20, 2012 at 1:38 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Mar 20, 2012 at 1:07 PM, Jamie Johnson jej2...@gmail.com wrote:
 I believe we're using replication to only duplicate the index
 (standard SolrCloud nothing special on our end) so I don't see why the
 docids wouldn't be the sameam I missing something that is
 happening there that I am unaware of?

 Each document is pushed to the replicas (i.e. standard whole-index
 replication is only used in recovery scenarios).  If you're using
 multiple threads to index, then docA can be indexed before docB on one
 replica and vice-versa on a different replica (or on the leader).
 Although even if this were not the case, I don't believe Lucene is
 deterministic in this respect anyway (i.e. indexing identically on two
 different boxes is not guaranteed to result in the exact same internal
 document order).

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10


RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Dyer, James
I also applied a fix to both Trunk/4.x and the 3.x branch (will be in 3.6 when 
it is released).  This should give you better error messages when something 
goes wrong when ScriptTransformer is invoked.  It will tell you that you need 
1.6 only if the functionality is absent (case #1 in my last message).  In case 
#2 or #3 it will tell you the language you specified isn't supported.  In 
case #4, it will tell you the script itself is invalid.

See https://issues.apache.org/jira/browse/SOLR-3260 .

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Tuesday, March 20, 2012 9:46 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR 3.3 DIH and Java 1.6

Taking a quick look at the code, it seems this exception could have been thrown 
for four reasons:  
(see org.apache.solr.handler.dataimport.ScriptTransformer#initEngine)

1. Your JRE doesn't have class javax.script.ScriptEngineManager  (pre 1.6, 
loaded here via reflection)

2. Your JRE doesn't have any installed scripting engines.  This little program 
outputs 1 engine on my JRE with 6 aliases:
[js, rhino, JavaScript, javascript, ECMAScript, ecmascript]
-
import javax.script.ScriptEngineFactory;
import javax.script.ScriptEngineManager;

public class TestScripting
{
public static void main(String args[])
{
ScriptEngineManager sem = new ScriptEngineManager();
for(ScriptEngineFactory sef : sem.getEngineFactories())
{
System.out.println(sef.getNames());
}
}
}
-
3. You specified an unsupported scripting engine name in the language 
parameter (see http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer)

4. The script you wrote in the script tag has errors.

Unfortunately, it looks like all 4 of these things are being checked in the 
same try/catch block.  So you could have any of these problems and are getting 
a potentially misleading error message.

One way to eliminate all #1#2 is to run the test 
org.apache.solr.handler.dataimport.TestScriptTransformer on your JRE and see 
if it passes.  (see here for how:  
http://wiki.apache.org/solr/HowToContribute#Unit_Tests)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: randolf.julian [mailto:randolf.jul...@dominionenterprises.com] 
Sent: Tuesday, March 20, 2012 9:24 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR 3.3 DIH and Java 1.6

Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino
Engine is not installed? This is the only way that I can prove that it's not
installed and we have to have it for SOLR data importhandler script to run.

Thanks again.
- Randolf

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multi-valued polyfields - Do they exist in the wild ?

2012-03-20 Thread ramdev.wudali
Hi:
   We have been keen on using polyfields for a while. But we have been 
restricted from using it because they do not seem to support Multi-values 
(yet). I am wondering if there are any Custom implementations  or is there any 
ETA on the Solr releases to include Multivalued PolyFields  .

Thanks for the support

Ramde


SV: To truncate or not to truncate (group.truncate vs. facet)

2012-03-20 Thread rasser
Thanks for taking the time to help me Erick!

Just to clarify my desired behavior from the facets. This is the index, notice 
color is multivalued to represent a model of car that has more than one color:

doc
field name=skuAudi A4/field
field name=brandaudi/field
field name=variant_idA4_black/field
field name=colorblack/field
field name=colorwhite/field
/doc
doc
field name=skuAudi A4/field
field name=brandaudi/field
field name=variant_idA4_white/field
field name=colorwhite/field
/doc
doc
field name=skuVolvo V50/field
field name=brandvolvo/field
field name=variant_idVolvo_V50/field
field name=colorblack/field
/doc
doc
field name=skuAudi A5/field
field name=brandaudi/field
field name=variant_idA5_white/field
field name=colorwhite/field
/doc
doc
field name=skuAudi S8/field
field name=brandaudi/field
field name=variant_idS8_yellow/field
field name=coloryellow/field
/doc
doc
field name=skuAudi S8/field
field name=brandaudi/field
field name=variant_idS8_black/field
field name=colorblack/field
field name=colorwhite/field
/doc

My goal is to to get this facet:
brand
-
audi (3)  - since there are 3 audi models (A4,A5 and S8)
volvo (1) - since there is only one volvo model (V50)

color
-
black (3) - since all models except except A5 is available in black
white (3) - since A4,A5 and S8 is available in white
yellow (1) - since only S8 is available in yellow

Thanks


Fra: Erick Erickson [via Lucene] [ml-node+s472066n3842071...@n3.nabble.com]
Sendt: 20. marts 2012 12:42
Til: Rasmus Østergård
Emne: Re: To truncate or not to truncate (group.truncate vs. facet)

Faceting is orthogonal to grouping, so be careful what you
ask for. So adding faceting would be easy, the only reason
I suggested grouping is your requirement that your brands be
just a count of the number of distinct ones found, not the
number of matching docs.

So a really simple solution would be to forget about grouping
and just facet. Then have your application change the counts
for all the brand entries to 1.

Best
Erick

On Mon, Mar 19, 2012 at 5:23 PM, rasser [hidden email]UrlBlockedError.aspx 
wrote:

 I see your point.

 If I understand it correct it will however mean that i need to return
 10(brands)x100(resultToShow) = 1000 docs to facilitate that all 100 results
 to show is of the same brand. Correnct?

 And tomorrow (or later) the customer will also want a facet on 5 new fields
 eg. production year. How could this be handled with the above approach?

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3840406.html
 Sent from the Solr - User mailing list archive at Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3842071.html
To unsubscribe from To truncate or not to truncate (group.truncate vs. facet), 
click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=cm9zQHZlcnRpY2EuZGt8MzgzODc5N3wxOTg1NDU0NDUx.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SV-To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3843321p3843321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-valued polyfields - Do they exist in the wild ?

2012-03-20 Thread Yonik Seeley
On Tue, Mar 20, 2012 at 2:17 PM,  ramdev.wud...@thomsonreuters.com wrote:
 Hi:
   We have been keen on using polyfields for a while. But we have been 
 restricted from using it because they do not seem to support Multi-values 
 (yet).

Poly-fields should support multi-values, it's more what uses them may not.
For example LatLon isn't multiValued because it doesn't have a
mechanism to correlate multiple values per document.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Thanks All

2012-03-20 Thread vybe3142
Here is the core of the SOLRJ client that ended up accomplishing what I
wanted

String fileName2 = C:\\work\\SolrClient\\data\\worldwartwo.txt;
SolrServer server = new
StreamingUpdateSolrServer(http://localhost:8080/solr/,20,8);
UpdateRequest req = new UpdateRequest(/update/extract);
ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
params.add(stream.file, new String[]{fileName2});
params.set(literal.id, fileName2);
params.set(captureAttr, false);


req.setParams(params);
server.request(req);
server.commit();

To get this to work correctly, the following server side config was needed
(I started from a barebones solr config)

1. Add apache-solr-cell-3.5.0.jar to the solrhost/lib directory (or
wherever solr can access jars) as this contains the class
ExtractingRequestHandler
2. Add the appropriate handler for /update/extract in the solrconfig.xml
(this uses the ExtractingRequestHandler class).

I'll blog about this later on for the benefit of the community at large

I'm still puzzled that there are no readily available alternatives to using
the Tika based ExtractingRequestHandler in the situation where the input
data is plain UTF-8 text files that SOLR needs to injest and index. I may
need to look into defining a custom Request Handler  if that's the right way
to go.

Thanks again

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3843593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: To truncate or not to truncate (group.truncate vs. facet)

2012-03-20 Thread rasser
Thanks for taking the time to help me Erick!

Just to clarify my desired behavior from the facets. This is the index,
notice color is multivalued to represent a model of car that has more than
one color:

doc
field name=skuAudi A4/field
field name=brandaudi/field
field name=variant_idA4_black/field
field name=colorblack/field
field name=colorwhite/field
/doc
doc
field name=skuAudi A4/field
field name=brandaudi/field
field name=variant_idA4_white/field
field name=colorwhite/field
/doc
doc
field name=skuVolvo V50/field
field name=brandvolvo/field
field name=variant_idVolvo_V50/field
field name=colorblack/field
/doc
doc
field name=skuAudi A5/field
field name=brandaudi/field
field name=variant_idA5_white/field
field name=colorwhite/field
/doc
doc
field name=skuAudi S8/field
field name=brandaudi/field
field name=variant_idS8_yellow/field
field name=coloryellow/field
/doc
doc
field name=skuAudi S8/field
field name=brandaudi/field
field name=variant_idS8_black/field
field name=colorblack/field
field name=colorwhite/field
/doc

My goal is to to get this facet: 
brand 
- 
audi (3)  - since there are 3 audi models (A4,A5 and S8)
volvo (1) - since there is only one volvo model (V50)

color 
- 
black (3) - since all models except except A5 is available in black
white (3) - since A4,A5 and S8 is available in white
yellow (1) - since only S8 is available in yellow 

And these 4 results (when query is *:*)

- Audi A4
- Audi A5
- Audi S8
- Volvo V50


Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3843596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get reference to original request object from within Solr component?

2012-03-20 Thread SUJIT PAL
Hi Hoss,

Thanks for the pointers, and sorry, it was a bug in my code (was some dead code 
which was alphabetizing the facet link text and also the parameters themselves 
indirectly by reference).

I actually ended up building a servlet and a component to print out the 
multi-valued parameters using HttpServletRequest.getParameterValues(myparam) 
and ResponseBuilder.req.getParams().getParams(myparam) respectively to 
isolate the problem. Both of them returned the parameters in the correct order.

So I went trolling through the code with a debugger, to observe exactly at what 
point the order got messed up, and found the bug.

FWIW, I am using Tomcat 5.5.

Thanks to everybody for their help, and sorry for the noise, guess I should 
have done the debugger thing before I threw up my hands :-).

-sujit

On Mar 19, 2012, at 6:55 PM, Chris Hostetter wrote:

 
 : I have a custom component which depends on the ordering of a 
 : multi-valued parameter. Unfortunately it looks like the values do not 
 : come back in the same order as they were put in the URL. Here is some 
 : code to explain the behavior:
   ...
 : and I notice that the values are ordered differently than [foo, bar, 
 : baz] that I would have expected. I am guessing its because the 
 : SolrParams is a MultiMap structure, so order is destroyed on its way in.
 
 a) MultiMapSolrParams does not destroy order on the way in
 b) when dealing with HTTP requests, the request params actaully use an 
 instance of ServletSolrParams which is backed directly by the 
 ServletRequest.getParameterMap() -- you should get the values returned in 
 the exact order as ServletRequest.getParameterMap().get(myparam)
 
 : 1) is there a setting in Solr can use to enforce ordering of 
 : multi-valued parameters? I suppose I could use a single parameter with 
 : comma-separated values, but its a bit late to do that now...
 
 Should already be enforced in MultiMapSolrParams and ServletSolrParams
 
 : 2) is it possible to use a specific SolrParams object that preserves order? 
 If so how?
 
 see above.
 
 : 3) is it possible to get a reference to the HTTP request object from within 
 a component? If so how?
 
 not out of the box, because there is no garuntee that solr is even running 
 in a servlet container. you can subclass SolrDispatchFilter to do this if 
 you wish (note the comment in the execute() method).
 
 My questions to you...
 
 1) what servlet container are you using? 
 2) have you tested your servlet 
 container with a simple servlet (ie: eliminate solr from the equation) to 
 verify that the ServletRequest.getParameterMap() contains your request 
 values in order?
 
 
 if you debug this and find evidence that something in solr is re-ordering 
 the values in a MultiMapSolrParams or ServletSolrParams *PLEASE* open a 
 jira with a reproducable example .. that would definitley be an anoying 
 bug we should get to the bottom of.
 
 
 -Hoss



Re: Replication with different schema

2012-03-20 Thread Erick Erickson
OK, I was thrown off by your use of schema, I thought
you were talking about schema.xml

Anyway, assuming you have some kind of loop that pages
through the documents via Solr, gets the results and then
sends them to another Solr server... yeah, that'll be slow.
You have the deep paging problem here.

I'd consider dropping into Lucene to spin through the
documents, fetch them and then assemble what you need
into a new SolrInputDocument that you then send to your
new server.

You really aren't moving any interesting data. By that I mean
by the time things go through your intermediate code, they're
pretty much primitive types so the fact that the various Solr
indexes have different schemas really isn't relevant.

Best
Erick


On Tue, Mar 20, 2012 at 1:17 PM, in.abdul in.ab...@gmail.com wrote:
 Thanks ..
 i need to index data from one solr  to another solr with different analyser
 ..
 Now i am able to do this by querying from solr which will be index into
 another solr
 NOTE: As the field which i need to reindex is stored so it is easy by as my
 index has 31 lakh record it is taking lot of time .. (suggest me for
 better performance)

            Thanks and Regards,
        S SYED ABDUL KATHER



 On Tue, Mar 13, 2012 at 10:05 PM, Erick Erickson [via Lucene] 
 ml-node+s472066n3822752...@n3.nabble.com wrote:

 Why would you want to? This seems like an
 XY problem, see:
 http://people.apache.org/~hossman/#xyproblem

 See the confFiles section here:
 http://wiki.apache.org/solr/SolrReplication
 although it mentions solrconfig.xml, it
 might work with schema.xml.

 BUT: This strikes me as really, really
 dangerous. I'm having a hard time
 thinking of a use-case that this makes sense
 for, so be very cautious. Having an index
 created with one schema and searched
 on with another is a recipe for disaster
 IMO unless you're very careful.

 Best
 Erick

 On Tue, Mar 13, 2012 at 3:40 AM, syed kather [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3822752i=0
 wrote:
  Team,
   Is it possible to do replication with different Schema  in solr ?
   If not how can i acheive this .
 
  Can any one can give an idea to do this
  advance thanks ..
 
             Thanks and Regards,
         S SYED ABDUL KATHER


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3822752.html
 To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 -
 THANKS AND REGARDS,
 SYED ABDUL KATHER
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3843068.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: To truncate or not to truncate (group.truncate vs. facet)

2012-03-20 Thread Erick Erickson
Ok, assuming sku is an un-tokenized field (and if it isn't, use
a copyField) then just facet on that field. Then, at the app layer,
combine them to get your aggregate counts.

So your raw return would have
Audi A4 (2)
Audi A5 (1)
Audi S8 (2)
Volvo V50 (1)

The app would have to be smart enough to spin through the sku facet
and just know that the three Audi SKUs need to be rolled up into one
Audi entry. This could be simple if the rule were that the SKU always
started with the brand name

And similarly for the other SKUs.


Crude, but it'd work.

Best
Erick

On Tue, Mar 20, 2012 at 4:01 PM, rasser r...@vertica.dk wrote:
 Thanks for taking the time to help me Erick!

 Just to clarify my desired behavior from the facets. This is the index,
 notice color is multivalued to represent a model of car that has more than
 one color:

 doc
 field name=skuAudi A4/field
 field name=brandaudi/field
 field name=variant_idA4_black/field
 field name=colorblack/field
 field name=colorwhite/field
 /doc
 doc
 field name=skuAudi A4/field
 field name=brandaudi/field
 field name=variant_idA4_white/field
 field name=colorwhite/field
 /doc
 doc
 field name=skuVolvo V50/field
 field name=brandvolvo/field
 field name=variant_idVolvo_V50/field
 field name=colorblack/field
 /doc
 doc
 field name=skuAudi A5/field
 field name=brandaudi/field
 field name=variant_idA5_white/field
 field name=colorwhite/field
 /doc
 doc
 field name=skuAudi S8/field
 field name=brandaudi/field
 field name=variant_idS8_yellow/field
 field name=coloryellow/field
 /doc
 doc
 field name=skuAudi S8/field
 field name=brandaudi/field
 field name=variant_idS8_black/field
 field name=colorblack/field
 field name=colorwhite/field
 /doc

 My goal is to to get this facet:
 brand
 -
 audi (3)  - since there are 3 audi models (A4,A5 and S8)
 volvo (1) - since there is only one volvo model (V50)

 color
 -
 black (3) - since all models except except A5 is available in black
 white (3) - since A4,A5 and S8 is available in white
 yellow (1) - since only S8 is available in yellow

 And these 4 results (when query is *:*)

 - Audi A4
 - Audi A5
 - Audi S8
 - Volvo V50


 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3843596.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Thanks All

2012-03-20 Thread Chris Hostetter

: To get this to work correctly, the following server side config was needed
: (I started from a barebones solr config)

: 1. Add apache-solr-cell-3.5.0.jar to the solrhost/lib directory (or
: wherever solr can access jars) as this contains the class
: ExtractingRequestHandler
: 2. Add the appropriate handler for /update/extract in the solrconfig.xml
: (this uses the ExtractingRequestHandler class).

what barebones solr config did you start with?

the example configs that ship with solr have included /update/extract 
since 1.4.0


-Hoss


Re: StreamingUpdateSolrServer - thread exit timeout?

2012-03-20 Thread Chris Hostetter

:  Is there any way to get get the threads within SUSS objects to immediately
:  exit without creating other issues?  Alternatively, if immediate isn't
:  possible, the exit could take 1-2 seconds.  I could not find any kind of
:  method in the API that closes down the object.

you should take al ook at this thread...

http://www.lucidimagination.com/search/document/53dc7e3d2102bb51

-Hoss


Re: Thanks All

2012-03-20 Thread Lance Norskog
If you build it, they will come!

On Tue, Mar 20, 2012 at 12:59 PM, vybe3142 vybe3...@gmail.com wrote:

 I'm still puzzled that there are no readily available alternatives to using
 the Tika based ExtractingRequestHandler in the situation where the input
 data is plain UTF-8 text files that SOLR needs to injest and index. I may
 need to look into defining a custom Request Handler  if that's the right way
 to go.




-- 
Lance Norskog
goks...@gmail.com


Re: Staggering Replication start times

2012-03-20 Thread William Bell
For our use case this is a no-no. When the index is updated, we need
all indexes to be updated at the same time.

We put all indexes (slaves) behind a load balancer and the user would
expect the same results from page to page.


On Tue, Mar 20, 2012 at 5:36 AM, Eric Pugh
ep...@opensourceconnections.com wrote:
 I am playing with an index that is sharded many times, between 64 and 128.  
 One thing I noticed is that with replication set to happen every 5 minutes, 
 it means that each slave hits the master at the same moment asking for 
 updates:  :00:00, :05:00, :10:00, :15:00 etc.   Replication takes very little 
 time, so it seems like I may be flooding the network with a bunch of traffic 
 requests, and then goes away.

 I tweaked the replication start time code to instead just start 5 minutes 
 after a shard starts up, which means instead of all of the slaves hitting at 
 the same moment, they are a bit staggered.   :00:00, :00:01, :00:02, :00:04 
 etcetera.   Which presumably will use my network pipe more efficiently.

 Any thoughts on this?  I know it means the slaves are more likely to be 
 slightly out of sync, but over a 5 minute range will get back in sync.

 Eric

 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
 http://www.opensourceconnections.com
 Co-Author: Apache Solr 3 Enterprise Search Server available from 
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 This e-mail and all contents, including attachments, is considered to be 
 Company Confidential unless explicitly stated otherwise, regardless of 
 whether attachments are marked as such.














-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: StreamingUpdateSolrServer - thread exit timeout?

2012-03-20 Thread Shawn Heisey

On 3/20/2012 8:11 PM, Chris Hostetter wrote:

:  Is there any way to get get the threads within SUSS objects to immediately
:  exit without creating other issues?  Alternatively, if immediate isn't
:  possible, the exit could take 1-2 seconds.  I could not find any kind of
:  method in the API that closes down the object.

you should take al ook at this thread...

http://www.lucidimagination.com/search/document/53dc7e3d2102bb51


I've got this in a standalone application with a main(), started from 
the commandline.  When I close it and it calls the shutdown hook, there 
is nothing from SolrJ logged to my log4j destination, stdout, or 
stderr.  I'm using SolrJ 3.5.0.


Is the memory leak you have mentioned still something I need to worry about?

Thanks,
Shawn