Re: data/index naming format

2013-09-05 Thread Jason Hellman
The circumstance I've most typically seen the index. show up is when 
an update is sent to a slave server.  The replication then appears to preserve 
the updated slave index in a separate folder while still respecting the correct 
data from the master.  

On Sep 5, 2013, at 8:03 PM, Shawn Heisey  wrote:

> On 9/5/2013 6:48 PM, Aditya Sakhuja wrote:
>> I am running solr 4.1 for now, and am confused about the structure and
>> naming of the contents of the data dir. I do not see the index.properties
>> being generated on a fresh solr node start either.
>> 
>> Can someone clarify when should one expect to see
>> 
>> data/index vs. data/index., and the index.properties along with
>> the second version.
> 
> I have never seen an index.properties file get created.  I've used
> versions from 1.4.0 through 4.4.0.
> 
> Generally when you have an index. directory, it's because
> you're doing replication.  There may be other circumstances when it
> appears, but I do not know what those are.
> 
> As for the other files in the index directory, here's Lucene's file
> format documentation:
> 
> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description
> 
> Thanks,
> Shawn
> 



Re: data/index naming format

2013-09-05 Thread Shawn Heisey
On 9/5/2013 6:48 PM, Aditya Sakhuja wrote:
> I am running solr 4.1 for now, and am confused about the structure and
> naming of the contents of the data dir. I do not see the index.properties
> being generated on a fresh solr node start either.
> 
> Can someone clarify when should one expect to see
> 
> data/index vs. data/index., and the index.properties along with
> the second version.

I have never seen an index.properties file get created.  I've used
versions from 1.4.0 through 4.4.0.

Generally when you have an index. directory, it's because
you're doing replication.  There may be other circumstances when it
appears, but I do not know what those are.

As for the other files in the index directory, here's Lucene's file
format documentation:

http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Thanks,
Shawn



Re: subindex

2013-09-05 Thread Erick Erickson
Nope. You can do this if you've stored _all_ the fields (with the exception
of
_version_ and the destinations of copyField directives). But there's no way
I
know of to do what you want if you haven't.

If you have, you'd be essentially spinning through all your docs and
re-indexing
just the fields you cared about. But if you still have access to your
original
docs this would be slower/more complicated than just re-indexing from
scratch.

Best
Erick


On Wed, Sep 4, 2013 at 1:51 PM, Peyman Faratin wrote:

> Hi
>
> Is there a way to build a new (smaller) index from an existing (larger)
> index where the smaller index contains a subset of the fields of the larger
> index?
>
> thank you


data/index naming format

2013-09-05 Thread Aditya Sakhuja
Hello,

I am running solr 4.1 for now, and am confused about the structure and
naming of the contents of the data dir. I do not see the index.properties
being generated on a fresh solr node start either.

Can someone clarify when should one expect to see

data/index vs. data/index., and the index.properties along with
the second version.

-- 
Regards,
-Aditya Sakhuja


solrcloud shards backup/restoration

2013-09-05 Thread Aditya Sakhuja
Hello,

I was looking for a good backup / recovery solution for the solrcloud
indexes. I am more looking for restoring the indexes from the index
snapshot, which can be taken using the replicationHandler's backup command.

I am looking for something that works with solrcloud 4.3 eventually, but
still relevant if you tested with a previous version.

I haven't been successful in have the restored index replicate across the
new replicas, after I restart all the nodes, with one node having the
restored index.

Is restoring the indexes on all the nodes the best way to do it ?
-- 
Regards,
-Aditya Sakhuja


Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-05 Thread Chris Hostetter

: yes sir i did restart the tomcat.

When you look at the Schema Browser for your default solr core (i'm 
guessing it's collection1?), does it list ignored_* as a dynamic field?  
does this URL below show you that "ignored_*" is using type "ignored" ? 
...

http://localhost:8983/solr/#/collection1/schema-browser?dynamic-field=ignored_*

...if not, then you aren't using the schema.xml that you think you are.



-Hoss


Re: SolrCloud 4.x hangs under high update volume

2013-09-05 Thread Tim Vaillancourt
Update: It is a bit too soon to tell, but about 6 hours into testing there
are no crashes with this patch. :)

We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard
cluster I mentioned above. 5000 updates per second total.

More tomorrow after a 24 hr soak!

Tim

On Wednesday, 4 September 2013, Tim Vaillancourt wrote:

> Thanks so much for the explanation Mark, I owe you one (many)!
>
> We have this on our high TPS cluster and will run it through it's paces
> tomorrow. I'll provide any feedback I can, more soon! :D
>
> Cheers,
>
> Tim
>


Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

2013-09-05 Thread Chris Hostetter

: I currently have Solr 4.3 set up with about 400 cores set to load upon 
: start up.  When starting Solr with an empty index for each core, Solr is 
: able to load all of the cores and start up normally as expected.  
: However, after running a dataimport on all cores and restarting Solr, it 
: hangs at "org.apache.solr.core.CoreContainer; registering core: ..." 
: without any type of error message in the log.  The process still exists 
: at this point, but doesn't make any progress even if left for a period 
: of time.  Prior to the restart, Solr continues to function normally, and 
: is searchable.

When solr gets into this state, can you generate a thread dump, wait 20-30 
seconds, generate another thread dump, and then send both to the list so 
we can see what's going on at this point?

The easiest way to generate a threaddump is with jstack on the same 
machine...

jstack  >> threaddumps.log


: hang at the same spot.  It does appear to be related to files to an 
: extent, since removing the index/"data" directory of half of the cores 
: does allow Solr to start up normally.

wild shot in the dark -- is it possible you have really large transaction 
logs that are being replayed on startup, because you never did a hard 
commit after indexing?

can you also include in your next email a listing of all the files in all 
the data dirs of the affected solr instance, including file sizes?

something along the lines of this command output from your solr home 
dir...

du -ab */data

?


-Hoss


Solr substring search

2013-09-05 Thread Scott Schneider
Hello,

I'm trying to find out how Solr runs a query for "*foo*".  Google tells me that 
you need to use NGramFilterFactory for that kind of substring search, but I 
find that even with very simple fieldTypes, it just works.  (Perhaps because 
I'm testing on very small data sets, Solr is willing to look through all the 
keywords.)  e.g. This works on the tutorial.

Can someone tell me exactly how this works and/or point me to the Lucene code 
that implements this?

Thanks,
Scott



Solr Cell Question

2013-09-05 Thread Jamie Johnson
Is it possible to configure solr cell to only extract and store the body of
a document when indexing?  I'm currently doing the following which I
thought would work

ModifiableSolrParams params = new ModifiableSolrParams();

 params.set("defaultField", "content");

 params.set("xpath", "/xhtml:html/xhtml:body/descendant::node()");

 ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
"/update/extract");

 up.setParams(params);

 FileStream f = new FileStream(new File(".."));

 up.addContentStream(f);

up.setAction(ACTION.COMMIT, true, true);

solrServer.request(up);


But the result of content is as follows



null
ISO-8859-1
text/plain; charset=ISO-8859-1
Just a little test



What I had hoped for was just


Just a little test



Re: More on topic of Meta-search/Federated Search with Solr

2013-09-05 Thread Paul Libbrecht
Hello list,

A student of a friend of mine made his masters on that topic, especially about 
federated ranking.

I have copied his text here:

http://direct.hoplahup.net/tmp/FederatedRanking-Koblischke-2009.pdf

Feel free to contact me to contact Robert Koblischke for questions.

Paul


On 28 août 2013, at 20:35, Dan Davis wrote:

> On Mon, Aug 26, 2013 at 9:06 PM, Amit Jha  wrote:
> 
>> Would you like to create something like
>> http://knimbus.com
>> 
> 
> I work at the National Library of Medicine.   We are moving our library
> catalog to a newer platform, and we will probably include articles.   The
> article's content and meta-data are available from a number of web-scale
> discovery services such as PRIMO, Summon, EBSCO's EDS, EBSCO's "traditional
> API".   Most libraries use open source solutions to avoid the cost of
> purchasing an expensive enterprise search platform.   We are big; we
> already have a closed-source enterprise search engine (and our own home
> grown Entrez search used for PubMed).Since we can already do Federated
> Search with the above, I am evaluating the effort of adding such to Apache
> Solr.   Because NLM data is used in the open relevancy project, we actually
> have the relevancy decisions to decide whether we have done a good job of
> it.
> 
> I obviously think it would be "Fun" to add Federated Search to Apache Solr.
> 
> *Standard disclosure *- my opinion's do not represent the opinions of NIH
> or NLM."Fun" is no reason to spend tax-payer money.Enhancing Apache
> Solr would reduce the risk of "putting all our eggs in one basket." and
> there may be some other relevant benefits.
> 
> We do use Apache Solr here for more than one other project... so keep up
> the good work even if my working group decides to go with the closed-source
> solution.



Re: charfilter doesn't do anything

2013-09-05 Thread Shawn Heisey
On 9/5/2013 10:03 AM, Andreas Owen wrote:
> i would like to filter / replace a word during indexing but it doesn't do 
> anything and i dont get a error.
> 
> in schema.xml i have the following:
> 
>  multiValued="true"/>
> 
> 
>   
> 
>  pattern="Zahlungsverkehr" replacement="ASDFGHJK" />
> 
>   
>
> 
> my 2. question is where can i say that the expression is multilined like in 
> javascript i can use /m at the end of the pattern?

I don't know about your second question.  I don't know if that will be
possible, but I'll leave that to someone who's more expert than I.

As for the first question, here's what I have.  Did you reindex?  That
will be required.

http://wiki.apache.org/solr/HowToReindex

Assuming that you did reindex, are you trying to search for ASDFGHJK in
a field that contains more than just "Zahlungsverkehr"?  The keyword
tokenizer might not do what you expect - it tokenizes the entire input
string as a single token, which means that you won't be able to search
for single words in a multi-word field without wildcards, which are
pretty slow.

Note that both the pattern and replacement are case sensitive.  This is
how regex works.  You haven't used a lowercase filter, which means that
you won't be able to search for asdfghjk.

Use the analysis tab in the UI on your core to see what Solr does to
your field text.

Thanks,
Shawn



Re: charfilter doesn't do anything

2013-09-05 Thread Jack Krupansky

And show us an input string and a query that fail.

-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Thursday, September 05, 2013 2:41 PM
To: solr-user@lucene.apache.org
Subject: Re: charfilter doesn't do anything

On 9/5/2013 10:03 AM, Andreas Owen wrote:
i would like to filter / replace a word during indexing but it doesn't do 
anything and i dont get a error.


in schema.xml i have the following:

multiValued="true"/>




  
  pattern="Zahlungsverkehr" replacement="ASDFGHJK" />

  

   

my 2. question is where can i say that the expression is multilined like 
in javascript i can use /m at the end of the pattern?


I don't know about your second question.  I don't know if that will be
possible, but I'll leave that to someone who's more expert than I.

As for the first question, here's what I have.  Did you reindex?  That
will be required.

http://wiki.apache.org/solr/HowToReindex

Assuming that you did reindex, are you trying to search for ASDFGHJK in
a field that contains more than just "Zahlungsverkehr"?  The keyword
tokenizer might not do what you expect - it tokenizes the entire input
string as a single token, which means that you won't be able to search
for single words in a multi-word field without wildcards, which are
pretty slow.

Note that both the pattern and replacement are case sensitive.  This is
how regex works.  You haven't used a lowercase filter, which means that
you won't be able to search for asdfghjk.

Use the analysis tab in the UI on your core to see what Solr does to
your field text.

Thanks,
Shawn 



Re: Numeric fields and payload

2013-09-05 Thread Erick Erickson
Peter:

I don't quite get this. Formatting to display is trivial as it's
usually done for just a few docs anyway. You could also
just store the original unaltered value and add an additional
"normalized" field.

Best
Erick


On Wed, Sep 4, 2013 at 2:02 PM, PETER LENAHAN  wrote:

> Chris Hostetter  fucit.org> writes:
>
> >
> >
> > : is it possible to store (text) payload to numeric fields (class
> > : solr.TrieDoubleField)?  My goal is to store measure units to numeric
> > : features - e.g. '1.5 cm' - and to use faceted search with these fields.
> > : But the field type doesn't allow analyzers to add the payload data. I
> > : want to avoid database access to load the units. I'm using Solr 4.2 .
> >
> > I'm not sure if it's possible to add payloads to Trie fields, but even if
> > there is i don't think you really want that for your usecase -- i think
> it
> > would make a lot more sense to normalize your units so you do consistent
> > sorting, range queries, and faceting on the values regardless of wether
> > it's 100cm or 1000mm or 1m.
> >
> > -Hoss
> >
> >
>
> Hoss,  What you suggest may be fine for specific units. But for monetary
> values with formatting it is not realistic. $10,000.00 would require
> formatting the number to display it.  It would be much easier to store the
> string as a payload with the formatted value.
>
>
> Peter Lenahan
>
>


Odd behavior after adding an additional core.

2013-09-05 Thread mike st. john
using solr 4.4  , i used collection admin to create a collection  4shards
replication - factor of 1

i did this so i could index my data, then bring in replicas later by adding
cores via coreadmin


i added a new core via coreadmin,  what i noticed shortly after adding the
core,  the leader of the shard where the new replica was placed was marked
active the new core marked as the leader  and the routing was now set to
implicit.



i've replicated this on another solr setup as well.


Any ideas?


Thanks

msj


Solr documents update on index

2013-09-05 Thread Luis Portela Afonso
Hi,

I'm having a problem when solr indexes.
It is updating documents already indexed. Is this a normal behavior?
If a document with the same key already exists is it supposed to be updated?
I has thinking that is supposed to just update if the information on the
rss has changed.

Appreciate your help

-- 
Sent from Gmail Mobile


bucket count for facets

2013-09-05 Thread Steven Bower
Is there a way to get the count of buckets (ie unique values) for a field
facet? the rudimentary approach of course is to get back all buckets, but
in some cases this is a huge amount of data.

thanks,

steve


Loading a SpellCheck dynamically

2013-09-05 Thread Mr Havercamp
I currently have multiple spellchecks configured in my solrconfig.xml to 
handle a variety of different spell suggestions in different languages.


In the snippet below, I have a catch-all spellcheck as well as an 
English only one for more accurate matching (I.e. my schema.xml is set 
up to capture english only fields to an english-specific textSpell_en 
field and then I also capture to a generic textSpell field):


---solrconfig.xml---


textSpell_en


default
spell_en
./spellchecker_en
true




textSpell


default
spell
./spellchecker
true



My question is; when I query my Solr index, am I able to load, say, just 
spellcheck values from the spellcheck_en spellchecker rather than from 
both? This would be useful if I were to start implementing additional 
language spellchecks; E.g. spellcheck_ja, spellcheck_fr, etc.


Thanks for any insights.

Cheers


Hayden


Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

2013-09-05 Thread Austin Rasmussen
Hello,

I currently have Solr 4.3 set up with about 400 cores set to load upon start 
up.  When starting Solr with an empty index for each core, Solr is able to load 
all of the cores and start up normally as expected.  However, after running a 
dataimport on all cores and restarting Solr, it hangs at 
"org.apache.solr.core.CoreContainer; registering core: ..." without any type of 
error message in the log.  The process still exists at this point, but doesn't 
make any progress even if left for a period of time.  Prior to the restart, 
Solr continues to function normally, and is searchable.

Solr is currently running in master-slave replication, and this same, exact 
behavior occurs on the master and both slaves.

I've checked all of the system log files and am also unable to find any errors 
or messages that would point to a particular problem.  Originally, I had 
thought it may have been related to an open file limit, but I also tried 
raising the limit to 65k, and Solr continued to hang at the same spot.  It does 
appear to be related to files to an extent, since removing the index/"data" 
directory of half of the cores does allow Solr to start up normally.

Any help or suggestions are appreciated.

Thanks!


charfilter doesn't do anything

2013-09-05 Thread Andreas Owen
i would like to filter / replace a word during indexing but it doesn't do 
anything and i dont get a error.

in schema.xml i have the following:





  
  
  

   

my 2. question is where can i say that the expression is multilined like in 
javascript i can use /m at the end of the pattern?

Re: JSON update request handler & commitWithin

2013-09-05 Thread Ryan, Brent
Ya, looks like this is a bug in Datastax Enterprise 3.1.2.  I'm using
their enterprise cluster search product which is built on SOLR 4.

:(



On 9/5/13 11:24 AM, "Jack Krupansky"  wrote:

>I just tried commitWithin with the standard Solr example in Solr 4.4 and
>it works fine.
>
>Can you reproduce your problem using the standard Solr example in Solr
>4.4?
>
>-- Jack Krupansky
>
>From: Ryan, Brent 
>Sent: Thursday, September 05, 2013 10:39 AM
>To: solr-user@lucene.apache.org
>Subject: JSON update request handler & commitWithin
>
>I'm prototyping a search product for us and I was trying to use the
>"commitWithin" parameter for posting updated JSON documents like so:
>
>curl -v 
>'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1'
>--data-binary @rfp.json -H 'Content-type:application/json'
>
>However, the commit never seems to happen as you can see below there are
>still 2 docsPending (even 1 hour later).  Is there a trick to getting
>this to work with submitting to the json update request handler?
>



Re: JSON update request handler & commitWithin

2013-09-05 Thread Jason Hellman
They have modified the mechanisms for committing documents…Solr in DSE is not 
stock Solr...so you are likely encountering a boundary where stock Solr 
behavior is not fully supported.

I would definitely reach out to them to find out if they support the request.

On Sep 5, 2013, at 8:27 AM, "Ryan, Brent"  wrote:

> Ya, looks like this is a bug in Datastax Enterprise 3.1.2.  I'm using
> their enterprise cluster search product which is built on SOLR 4.
> 
> :(
> 
> 
> 
> On 9/5/13 11:24 AM, "Jack Krupansky"  wrote:
> 
>> I just tried commitWithin with the standard Solr example in Solr 4.4 and
>> it works fine.
>> 
>> Can you reproduce your problem using the standard Solr example in Solr
>> 4.4?
>> 
>> -- Jack Krupansky
>> 
>> From: Ryan, Brent 
>> Sent: Thursday, September 05, 2013 10:39 AM
>> To: solr-user@lucene.apache.org
>> Subject: JSON update request handler & commitWithin
>> 
>> I'm prototyping a search product for us and I was trying to use the
>> "commitWithin" parameter for posting updated JSON documents like so:
>> 
>> curl -v 
>> 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1'
>> --data-binary @rfp.json -H 'Content-type:application/json'
>> 
>> However, the commit never seems to happen as you can see below there are
>> still 2 docsPending (even 1 hour later).  Is there a trick to getting
>> this to work with submitting to the json update request handler?
>> 
> 



Re: JSON update request handler & commitWithin

2013-09-05 Thread Jack Krupansky
I just tried commitWithin with the standard Solr example in Solr 4.4 and it 
works fine.

Can you reproduce your problem using the standard Solr example in Solr 4.4?

-- Jack Krupansky

From: Ryan, Brent 
Sent: Thursday, September 05, 2013 10:39 AM
To: solr-user@lucene.apache.org 
Subject: JSON update request handler & commitWithin

I'm prototyping a search product for us and I was trying to use the 
"commitWithin" parameter for posting updated JSON documents like so:

curl -v 
'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' 
--data-binary @rfp.json -H 'Content-type:application/json'

However, the commit never seems to happen as you can see below there are still 
2 docsPending (even 1 hour later).  Is there a trick to getting this to work 
with submitting to the json update request handler?



JSON update request handler & commitWithin

2013-09-05 Thread Ryan, Brent
I'm prototyping a search product for us and I was trying to use the 
"commitWithin" parameter for posting updated JSON documents like so:

curl -v 
'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' 
--data-binary @rfp.json -H 'Content-type:application/json'

However, the commit never seems to happen as you can see below there are still 
2 docsPending (even 1 hour later).  Is there a trick to getting this to work 
with submitting to the json update request handler?
[cid:483C4A1C-D20D-4AAB-822E-DFCA03026572]



Re: Tweaking boosts for more search results variety

2013-09-05 Thread Jack Krupansky
The grouping (field collapsing) feature somewhat addresses this - group by a 
"site" field and then if more than one or a few top pages are from the same 
site they get grouped or collapsed so that you can see more sites in a few 
results.


See:
http://wiki.apache.org/solr/FieldCollapsing
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

-- Jack Krupansky

-Original Message- 
From: Sai Gadde

Sent: Thursday, September 05, 2013 2:27 AM
To: solr-user@lucene.apache.org
Subject: Tweaking boosts for more search results variety

Our index is aggregated content from various sites on the web. We want good
user experience by showing multiple sites in the search results. In our
setup we are seeing most of the results from same site on the top.

Here is some information regarding queries and schema
   site - String field. We have about 1000 sites in index
   sitetype - String field.  we have 3 site types
omitNorms="true" for both the fields

Doc count varies largely based on site and sitetype by a factor of 10 -
1000 times
Total index size is about 5 million docs.
Solr Version: 4.0

In our queries we have a fixed and preferential boost for certain sites.
sitetype has different and fixed boosts for 3 possible values. We turned
off Inverse Document Frequency (IDF) for these boosts to work properly.
Other text fields are boosted based on search keywords only.

With this setup we often see a bunch of hits from a single site followed by
next etc.,
Is there any solution to see results from variety of sites and still keep
the preferential boosts in place? 



Re: Solr Cloud hangs when replicating updates

2013-09-05 Thread Erick Erickson
If you run into this again, try a jstack trace. You should see
evidence of being stuck in SolrCmdDistributor on a variable
called "semaphore"... On current 4x this is around line 420.

If you're using SolrJ, then SOLR-4816 is another thing to try.

But Mark's patch would be best of all to test, If that doesn't
fix it then the jstack suggestion would at least tell us if it's
the issue we think it is.

FWIW,
Erick


On Wed, Sep 4, 2013 at 12:51 PM, Mark Miller  wrote:

> It would be great if you could give this patch a try:
> http://pastebin.com/raw.php?i=aaRWwSGP
>
> - Mark
>
>
> On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn 
> wrote:
>
> > Thanks. If there is anything I can do to help you resolve this issue, let
> > me know.
> >
> > -Kevin
> >
> >
> > On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller 
> wrote:
> >
> > > Ill look at fixing the root issue for 4.5. I've been putting it off for
> > > way to long.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 3, 2013, at 2:15 PM, Kevin Osborn 
> wrote:
> > >
> > > > I was having problems updating SolrCloud with a large batch of
> records.
> > > The
> > > > records are coming in bursts with lulls between updates.
> > > >
> > > > At first, I just tried large updates of 100,000 records at a time.
> > > > Eventually, this caused Solr to hang. When hung, I can still query
> > Solr.
> > > > But I cannot do any deletes or other updates to the index.
> > > >
> > > > At first, my updates were going as SolrJ CSV posts. I have also tried
> > > local
> > > > file updates and had similar results. I finally slowed things down to
> > > just
> > > > use SolrJ's Update feature, which is basically just JavaBin. I am
> also
> > > > sending over just 100 at a time in 10 threads. Again, it eventually
> > hung.
> > > >
> > > > Sometimes, Solr hangs in the first couple of chunks. Other times, it
> > > hangs
> > > > right away.
> > > >
> > > > These are my commit settings:
> > > >
> > > > 
> > > >   15000
> > > >   5000
> > > >   false
> > > > 
> > > > 
> > > > 3
> > > >   
> > > >
> > > > I have tried quite a few variations with the same results. I also
> tried
> > > > various JVM settings with the same results. The only variable seems
> to
> > be
> > > > that reducing the cluster size from 2 to 1 is the only thing that
> > helps.
> > > >
> > > > I also did a jstack trace. I did not see any explicit deadlocks, but
> I
> > > did
> > > > see quite a few threads in WAITING or TIMED_WAITING. It is typically
> > > > something like this:
> > > >
> > > >  java.lang.Thread.State: WAITING (parking)
> > > >at sun.misc.Unsafe.park(Native Method)
> > > >- parking to wait for  <0x00074039a450> (a
> > > > java.util.concurrent.Semaphore$NonfairSync)
> > > >at
> > > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > >at
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> > > >at
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> > > >at
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> > > >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> > > >at
> > > >
> > >
> >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> > > >at
> > > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> > > >at
> > > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> > > >at
> > > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> > > >at
> > > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
> > > >at
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
> > > >at
> > > >
> > >
> >
> org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
> > > >at
> > > >
> > >
> >
> org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
> > > >at
> > > >
> > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
> > > >at
> > > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
> > > >at
> > > >
> > >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > > >at
> > > >
> > >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > > >at
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > >at org.apache.solr.core.SolrCore.exec

Re: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-09-05 Thread Erick Erickson
The very first thing I'd do is see if you can _not_ use joins. Especially
if you're coming from a RDBMS background. Joins in Solr are
somewhat specialized and are NOT equivalent to db joins.

First of all there's no way to get fields from the "from" part
of the join returned in the results. Secondly, there are a number
of cases where the performance isn't stellar. Thirdly...

The first approach is always to explore denormalizing the data so
you can do straight searches rather than joins. Second is to think
about your use case carefully and se if there are clever indexing
schemes that allow you to not use joins.

Only after those avenues are exhausted would I rely on joins.
There's a reason they are sometimes referred to as "pseudo joins"

Best,
Erick


On Wed, Sep 4, 2013 at 4:19 AM, Sukanta Dey wrote:

> Hi Team,
>
> In my project I am going to use Apache solr-4.4.0 version for searching.
> While doing that I need to join between multiple solr documents within the
> same core on one of the common field across the documents.
> Though I successfully join the documents using solr-4.4.0 join syntax, it
> is returning me the expected result, but, since my next requirement is to
> sort the returned result on basis of the fields from the documents
> Involved in join condition's "from" clause, which I was not able to get.
> Let me explain the problem in detail along with the files I am using ...
>
>
> 1)  Files being used :
>
> a.   Picklist_1.xml
>
> --
>
> 
>
> t1324838
>
> 7
>
> 956
>
> 130712901
>
> Draft
>
> Draoft
>
> 
>
>
>
> b.  Picklist_2.xml
>
> ---
>
> 
>
> t1324837
>
> 7
>
> 87749
>
> 130712901
>
> New
>
> Neuo
>
> 
>
>
>
> c.   AssetID_1.xml
>
> ---
>
> 
>
> t1324837
>
> a180894808
>
> 1
>
> true
>
> 2013-09-02T09:28:18Z
>
> 130713716
>
> 130712901
>
> 
>
>
>
> d.  AssetID_2.xml
>
> 
>
> 
>
>  t1324838
>
>  a171658357
>
> 1
>
> 130713716
>
> 2283961
>
> 2290309
>
> 7
>
> 7
>
> 13503796
> 15485964
>
> 38052
>
> 41133
>
> 130712901
>
> 
>
>
>
> 2)  Requirement:
>
> 
>
> i. It needs to have a join  between the files using
> "def14227_picklist" field from AssetID_1.xml and AssetID_2.xml and
> "describedObjectId" field from Picklist_1.xml and Picklist_2.xml files.
>
> ii.   After joining we need to have all the fields from
> the files AssetID_*.xml and "en","gr" fields from Picklist_*.xml files.
>
> iii.  While joining we also sort the result based on the
> "en" field value.
>
>
>
> 3)  I was trying with "q={!join from=inner_id to=outer_id}zzz:vvv"
> syntax but no luck.
>
> Any help/suggestion would be appreciated.
>
> Thanks,
> Sukanta Dey
>
>
>
>
>


Tweaking Edismax on the Phrase Fields

2013-09-05 Thread Bruno René Santos
Hi,

I have a doubt about the raw query that is parsed from a edismax query.
Form example the query:

_query_:"{!edismax mm=100% bf='log(div(9900,producttier))'
pf='name_synonyms~100^3 name~100^6 heading~100^20' pf2='name_synonyms~100^3
name~100^6 heading~100^20' qf='name_synonyms^3 name^6 heading^20'}hotel
centro lisboa"

is transformed into


(+((DisjunctionMaxQuery((name_synonyms:hotel^3.0 | heading:hotel^20.0
| name:hotel^6.0)) DisjunctionMaxQueryname_synonyms:semtr
name_synonyms:centr)^3.0) | ((heading:semtr heading:centr)^20.0) |
((name:semtr name:centr)^6.0)))
DisjunctionMaxQueryname_synonyms:lisbon name_synonyms:lisbo)^3.0)
| ((heading:lisbon heading:lisbo)^20.0) | ((name:lisbon
name:lisbo)^6.0~3) DisjunctionMaxQuery((name_synonyms:\"hotel
(semtr centr) (lisbon lisbo)\"~100^3.0))
DisjunctionMaxQuery((name:\"hotel (semtr centr) (lisbon
lisbo)\"~100^6.0)) DisjunctionMaxQuery((heading:\"hotel (semtr centr)
(lisbon lisbo)\"~100^20.0))
(DisjunctionMaxQuery((name_synonyms:\"hotel (semtr centr)\"~100^3.0))
DisjunctionMaxQuery((name_synonyms:\"(semtr centr) (lisbon
lisbo)\"~100^3.0))) (DisjunctionMaxQuery((name:\"hotel (semtr
centr)\"~100^6.0)) DisjunctionMaxQuery((name:\"(semtr centr) (lisbon
lisbo)\"~100^6.0))) (DisjunctionMaxQuery((heading:\"hotel (semtr
centr)\"~100^20.0)) DisjunctionMaxQuery((heading:\"(semtr centr)
(lisbon lisbo)\"~100^20.0)))
FunctionQuery(log(div(const(9900),int(producttier)/no_coord

As you can see for each field on a phrase query a new
DisjunctionMaxQuery is created. Why the behaviour is not same as the
qf? On the qf the most important field (max) is what is counts. on the
phrase query all fields participate on the final score. Is there any
way to emulate the qf behaviour of the qf (one DisjunctionMaxQuery for
each combination) on the pf? like one DisjunctionMaxQuery for pf,
another for the pf2, etc

Regards

Bruno

-- 

Bruno René Santos
Lisboa - Portugal


Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-05 Thread Stefan Matheis
Dimitri

I've added you to the https://wiki.apache.org/solr/ContributorsGroup - feel 
free to improve the wiki :)

- Stefan 


On Wednesday, September 4, 2013 at 11:46 PM, Dmitri Popov wrote:

> Upayavira,
> 
> I could edit that page myself, but need to be confirmed human according to
> http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki
> 
> My wiki account name is 'pin' just in case.
> 
> On Wed, Sep 4, 2013 at 5:27 PM, Upayavira  (mailto:u...@odoko.co.uk)> wrote:
> 
> > It's a wiki. Can't you correct it?
> > 
> > Upayavira
> > 
> > On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote:
> > > Hi,
> > > 
> > > http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
> > > too) become out of date:
> > > 
> > > In configuration section
> > > 
> > >  > > name="xslt"
> > > class="org.apache.solr.request.XSLTResponseWriter">
> > > 5
> > > 
> > > 
> > > class name
> > > 
> > > org.apache.solr.request.XSLTResponseWriter
> > > 
> > > should be replaced by
> > > 
> > > org.apache.solr.response.XSLTResponseWriter
> > > 
> > > Otherwise ClassNotFoundException happens. Change is result of
> > > https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.
> > > 
> > > Apparently can't update that page myself, please could someone else do
> > > that?
> > > 
> > > Thanks! 



Re: Solr 4.3: Recovering from "Too many values for UnInvertedField faceting on field"

2013-09-05 Thread Dmitry Kan
We had a similar case for multivalued fields with a lot of unique values
per field in some cases. Using facet.method=enum instead of facet.method=fc
fixed the problem. Can run slower though.

Dmitry


On Tue, Sep 3, 2013 at 5:04 PM, Dennis Schafroth wrote:

> We are harvesting and indexing bibliographic data, thus having many
> distinct author names in our index. While testing Solr 4 I believe I had
> pushed a single core to 100 million records (91GB of data) and everything
> was working fine and fast. After adding a little more to the index, then
> following started to happen:
>
> 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore
> – Approaching too many values for UnInvertedField faceting on field
> 'author_exact' : bucket size=16726546
> 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore
> – UnInverted multi-valued field
> {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0}
> 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore
> – org.apache.solr.common.SolrException: Too many values for UnInvertedField
> faceting on field author_exact
> at org.apache.solr.request.UnInvertedField.(UnInvertedField.java:181)
> at
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
>
> I can see that we reached a limit of bucket size. Is there a way to adjust
> this? The index also seem to explode in size (217GB).
>
> Thinking that I had reached a limit for what a single core could handle in
> terms of facet, I deleted records in the index, but even now at 1/3 (32
> million) it will still fails with above error. I have optimised with
> expungeDeleted=true. The index is  somewhat larger (76GB) than I would have
> expected.
>
> While we can still use the index and get facets back using enum method on
> that field, I would still like a way to fix the index if possible. Any
> suggestions?
>
> cheers,
> :-Dennis