external file field and fl parameter

2013-07-14 Thread Chris Collins
I am playing with external file field for sorting.  I created a dynamic field 
using the ExternalFileField type.  

I naively assumed that the fl argument would allow me to return the value the 
external field but doesnt seem to do so.

For instance I have a defined a dynamic field:

*_efloat

then I used:

sort=foo_efloat desc
fl=foo_efloat, score, description

I get the score and description but the foo_efloat seems to be missing in 
action.


Thoughts?

C



Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Hi,

I'm using the PHP Solr client (ver: 1.0.2).

I'm indexing the contents through my database. 
Suppose $data is a stdClass object having id, name, title, etc. from a
database entry.

Next, I declare a solr Document and assign fields to it.:

$doc = new SolrInputDocument();
$doc-addField ('id' , $data-id);
$doc-addField ('name' , $data-name);



I wanted to know how can I store the contents of a pdf file (whose path I've
stored in $data-filepath), in the same solr document, say in a field
('filecontent').

Referring to the wiki, I was unable to figure out the proper cURL request
for achieving this. I was able to create a completely new solr document but
how do I get the contents of the pdf file in the same solr document so that
I can store that in a field?


$doc = new SolrInputDocument();
$doc-addField ('id' , $data-id);
$doc-addField ('name' , $data-name);


//fire the curl request here referring to the file at $data-filepath
$doc-addField ('filecontent' , //content of the pdf file);

Also, instead of firing the raw cURL request, is there a better way? I don't
know if the current PECL SOLR Client 1.0.2 has the feature of indexing pdf
files.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem using Term Component in solr

2013-07-14 Thread Parul Gupta(Knimbus)
Hi,

Vocabulary is not known that's the main issue else I will implement synonyms
instead.
 what do u mean by 'regularizing the title'.

so let me know some solution...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-using-Term-Component-in-solr-tp4077200p4077865.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to from solr facet exclude specific “Tag”!

2013-07-14 Thread 张智
solr 4.3

this is my query request params:

lst name=responseHeaderint name=status0/intint 
name=QTime15/intlst name=paramsstr name=facettrue/strstr 
name=indenttrue/strstr name=q*:*/strstr 
name=_1373713374569/strarr 
name=facet.fieldstr{!ex=city}CityId/strstr{!ex=company}CompanyId/str/arrstr
 name=wtxml/strstr name=fq{!tag=city}CityId:729 AND 
{!tag=company}CompanyId:16122/str/lst/lst

This is the query response Facet content:

lst name=facet_countslst name=facet_queries/lst 
name=facet_fieldslst name=CityIdint name=11100171/intint 
name=140489406/intint name=077477/intint 
name=136665780/intint name=136258092/intint 
name=72929213/intint name=79828975/int...int 
name=7262808/intint name=432776/intint name=1146772/intint 
name=1653765/intint name=1078668/intint name=814667/intint 
name=2049402/intint name=456401/intint 
name=401390/int/lstlst name=CompanyIdint name=16122971/intint 
name=690/intint name=710/intint name=720/intint 
name=790/intint name=800/intint name=850/intint 
name=880/intint name=940/int...int name=980/intint 
name=1040/intint name=1120/intint name=1130/intint 
name=1180/intint name=1230/intint name=1260/intint 
name=1310/intint name=1360/intint name=1390/int/lst/lstlst
  name=facet_dates/lst name=facet_ranges//lst

You can see CityId the Facet is correct, it excludes the {! Tag = city} 
CityId: 729 queries , but CompanyId the Facet is not correct , he did not 
rule out {! Tag = company} CompanyId: 16122 queries. How to solve it ?

Re: Does Solrj Batch Processing Querying May Confuse?

2013-07-14 Thread Erick Erickson
Well, if you can find one of the docs, or you know one of the IDs
that's missing, try explainOther, see:
http://wiki.apache.org/solr/CommonQueryParameters#explainOther

Best
Erick

On Fri, Jul 12, 2013 at 8:29 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 I've crawled some webpages and indexed them at Solr. I've queried data at
 Solr via Solrj. url is my unique field and I've define my query as like
 that:

 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(q, lang:tr);
 params.set(fl, url);
 params.set(sort, url desc);

 I've run my program to query 1000 rows at each query and wrote them in a
 file. However I realized that there are some documents that are indexed at
 Solr (I query them from admin page, not from Solrj as a 1000 row batch
 process) but is not at my file. What may be the problem for that?


Re: Multiple queries or Filtering Queries in Solr

2013-07-14 Thread Erick Erickson
Isn't this just a filter query? (fq=)?

Something like
q=query2fq=query1

Although I don't quite understand the 500  50, but you can always
tack on additional fq clauses, it's basically set intersection.

As for limiting the results a user sees, that's what thr rows parameter is for.

So another way of looking at this is can you form a query that expresses
the use-case and just show the top N (in this case 50)? Does that work?

Best
Erick

On Fri, Jul 12, 2013 at 10:44 AM, dcode darshan.bengal...@gmail.com wrote:


 My problem is I have n fields (say around 10) in Solr that are searchable,
 they all are indexed and stored. I would like to run a query first on my
 whole index of say 5000 docs which will hit around an average of 500 docs.
 Next I would like to query using a different set of keywords on these 500
 docs and NOT on the whole index.

 So the first time I send a query a score will be generated, the second time
 I run a query the new score generated should be based on the 500 documents
 of the previous query, or in other words Solr should consider only these 500
 docs as the whole index.

 To summarise this, Index of 5000 will be filtered to 500 and then 50
 (500050050). Its basically filtering but I would like to do this in Solr.

 I have reasonable basic knowledge and still learning.

 Update: If represented mathematically it would look like this:
 results1=f(query1)
 results2=f(query2, results1)
 final_results=f(query3, results2)

 I would like this to be accomplish using a program and end-user will only
 see 50 results. So faceting is not an option.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multiple-queries-or-Filtering-Queries-in-Solr-tp4077574.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-14 Thread Erick Erickson
Done, sorry it took so long, hadn't looked at the list in a couple of days.


Erick

On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib docbook@gmail.com wrote:
 username: saqib


 On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib docbook@gmail.com wrote:

 Hello,

 Can you please add me to the ContributorsGroup? I would like to add
 instructions for setting up SolrCloud using Jboss.

 thanks.




Re: Custom processing in Solr Request Handler plugin and its debugging ?

2013-07-14 Thread Erick Erickson
Not sure how to do the pass to another request handler thing, but
the debugging part is pretty straightforward. I use IntelliJ, but as far
as I know Eclipse has very similar capabilities.

First, I cheat and path to the jar that's the output from my IDE, that
saves copying the jar around. So my solrconfig.xml file has  a lib
directive like
../../../../../eoe/project/out/artifact/jardir
where this is wherever your IDE wants to put it. It can sometimes be
tricky to get enough ../../../ in there.

Second, edit config, select remote and a form comes up. Fill
in host and port, something like localhost and 5900 (this latter
is whatever you want. In IntelliJ that'll give you the specific command
to use to start Solr so you can attach. This looks like the following
for my setup:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now just fire up Solr as above. Fire up your remote debugging
session in IntelliJ. Set breakpoints as you wish. NOT: the suspend=y
bit above means that Solr will do _nothing_ until you attach the
debugger and hit go

HTH
Erick

On Sat, Jul 13, 2013 at 6:57 AM, Tony Mullins tonymullins...@gmail.com wrote:
 Please any help on how to pass the search request to different
 RequestHandler from within the custom RequestHandler and how to debug the
 custom RequestHandler plugin ?

 Thanks,
 Tony


 On Fri, Jul 12, 2013 at 4:41 PM, Tony Mullins tonymullins...@gmail.comwrote:

 Hi,

 I have defined my new Solr RequestHandler plugin like this in
 SolrConfig.xml

 requestHandler name=/myendpoint class=com.abc.MyRequestPlugin
 /requestHandler

 And its working fine.

 Now I want to do some custom processing from my this plugin by making a
 search query to regular '/select' handler.
  requestHandler name=/select class=solr.SearchHandler
  
 /requestHandler

 And then receive the results back from '/select' handler and perform some
 custom processing on those results and send the response back to my custom
 /myendpoint handler.

 And for this I need help on how to make a call to '/select' handler from
 within the .MyRequestPlugin class and perform some calculation on the
 results.

 I also need some help on how to debug my plugin ? As its .jar is been
 deployed to solr_hom/lib ... how can I attach my plugin's code in eclipse
 to Solr process so I could debug it when user will send request to my
 plugin.

 Thanks,
 Tony



Re: external file field and fl parameter

2013-07-14 Thread Erick Erickson
Did you store the field? I.e. set stored=true? And does the EFF contain
values for the docs you're returning?

Best
Erick

On Sun, Jul 14, 2013 at 3:32 AM, Chris Collins ch...@geekychris.com wrote:
 I am playing with external file field for sorting.  I created a dynamic field 
 using the ExternalFileField type.

 I naively assumed that the fl argument would allow me to return the value 
 the external field but doesnt seem to do so.

 For instance I have a defined a dynamic field:

 *_efloat

 then I used:

 sort=foo_efloat desc
 fl=foo_efloat, score, description

 I get the score and description but the foo_efloat seems to be missing in 
 action.


 Thoughts?

 C



Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
Well, cURL is generally not what people use for production. What I'd consider
is using SolrJ (which you can access Tika from) and then store the raw pdf
(or whatever) document as a binary data type in Solr.

Here's an example (with DB indexing mixed in, but you should be able
to pull that part out).

Best
Erick

On Sun, Jul 14, 2013 at 4:05 AM, xan p...@prateeksachan.com wrote:
 Hi,

 I'm using the PHP Solr client (ver: 1.0.2).

 I'm indexing the contents through my database.
 Suppose $data is a stdClass object having id, name, title, etc. from a
 database entry.

 Next, I declare a solr Document and assign fields to it.:

 $doc = new SolrInputDocument();
 $doc-addField ('id' , $data-id);
 $doc-addField ('name' , $data-name);
 
 

 I wanted to know how can I store the contents of a pdf file (whose path I've
 stored in $data-filepath), in the same solr document, say in a field
 ('filecontent').

 Referring to the wiki, I was unable to figure out the proper cURL request
 for achieving this. I was able to create a completely new solr document but
 how do I get the contents of the pdf file in the same solr document so that
 I can store that in a field?


 $doc = new SolrInputDocument();
 $doc-addField ('id' , $data-id);
 $doc-addField ('name' , $data-name);
 
 
 //fire the curl request here referring to the file at $data-filepath
 $doc-addField ('filecontent' , //content of the pdf file);

 Also, instead of firing the raw cURL request, is there a better way? I don't
 know if the current PECL SOLR Client 1.0.2 has the feature of indexing pdf
 files.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem using Term Component in solr

2013-07-14 Thread Erick Erickson
by regularizing the title I meant either indexing and
searching exactly:
Medical Engineering and Physics
or
Medical Eng. and Phys.

Or you could remove the stopwords yourself at both
index and query time, which would fix your Physics
of Fluids example.

The problem here is that you'll be forever fiddling with
this and getting it _almost_ right, then the next
anomaly will happen Siiigh

You might actually be much better off with an ngram
or edgeNgram approach. You'd probably want to
tokenize the titles, and perhaps auto-generate phrase
queries...

Best
Erick


On Sun, Jul 14, 2013 at 7:30 AM, Parul Gupta(Knimbus)
parulgp...@gmail.com wrote:
 Hi,

 Vocabulary is not known that's the main issue else I will implement synonyms
 instead.
  what do u mean by 'regularizing the title'.

 so let me know some solution...



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-using-Term-Component-in-solr-tp4077200p4077865.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Sorry, but did you forget to send me the example's link?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856p4077877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud leader

2013-07-14 Thread kowish.adamosh
The problem is that I don't want to invoke data import on 8 server nodes but
to choose only one for scheduling. Of course if this server will shut down
then another one needs to take the scheduler role. I can see that there is
task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I hope
they will take into account SolrCloud. And that's why I wanted to know if
current node is *currently* elected as the leader. The leader would be the
scheduler.

In the meanwhile, any ideas of how to solve data import scheduling on
SolrCloud architecture?

Kowish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-leader-tp4077759p4077878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread PeterKerk
Hi Shawn,

I'm also getting the HTTP Status 503 - Server is shutting down error when
navigating to http://localhost:8080/solr-4.3.1/

I already copied the logging.properties file from
C:\Dropbox\Databases\solr-4.3.1\example\etc to
C:\Dropbox\Databases\solr-4.3.1\example\lib

Here's my Tomcat console log:

14-jul-2013 14:21:57 org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performanc
e in production environments was not found on the java.library.path:
C:\Program
Files\Apache Software Foundation\Tomcat
6.0\bin;C:\Windows\Sun\Java\bin;C:\Windo
ws\system32;C:\Windows;C:\Program Files\Common Files\Microsoft
Shared\Windows Li
ve;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows
Live;C:\Windows\
system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
ll\v1.0\;C:\Program Files\TortoiseSVN\bin;c:\msxsl;C:\Program Files
(x86)\Window
s Live\Shared;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program
File
s (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\;C:\Program Files
(x86)\Windows
 Kits\8.0\Windows Performance Toolkit\;C:\Program Files\Microsoft SQL
Server\110
\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL
Server\110\Tools\Binn\;C:\Prog
ram Files\Microsoft SQL Server\110\DTS\Binn\;C:\Program Files
(x86)\Microsoft SQ
L Server\110\Tools\Binn\ManagementStudio\;C:\Program Files (x86)\Microsoft
SQL S
erver\110\DTS\Binn\;C:\Program Files (x86)\Java\jre6\bin;C:\Program
Files\Java\j
re631\bin;.
14-jul-2013 14:21:57 org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
14-jul-2013 14:21:57 org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 283 ms
14-jul-2013 14:21:57 org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
14-jul-2013 14:21:57 org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.37
14-jul-2013 14:21:57 org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor manager.xml
14-jul-2013 14:21:57 org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive solr-4.3.1.war
log4j:WARN No appenders could be found for logger
(org.apache.solr.servlet.SolrD
ispatchFilter).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more in
fo.
14-jul-2013 14:21:58 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory ROOT
14-jul-2013 14:21:58 org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
14-jul-2013 14:21:58 org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
14-jul-2013 14:21:58 org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/55  config=null
14-jul-2013 14:21:58 org.apache.catalina.startup.Catalina start
INFO: Server startup in 719 ms



--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-503-Server-is-shutting-down-tp4065958p4077879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Yep I did switch on stored=true in the field type.  I was able to confirm a few 
ways that there are values for the eff by two methods:

1) changing desc to asc produced drastically different results.

2) debugging FileFloatSource the following was getting triggered filling the 
vals array:
while ((doc = docsEnum.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS)
{
vals[doc] = fval;
}

At least by you asking these questions I guess it should work.  I will continue 
dissecting. 

Thanks Erick.

C
On Jul 14, 2013, at 5:16 AM, Erick Erickson erickerick...@gmail.com wrote:

 Did you store the field? I.e. set stored=true? And does the EFF contain
 values for the docs you're returning?
 
 Best
 Erick
 
 On Sun, Jul 14, 2013 at 3:32 AM, Chris Collins ch...@geekychris.com wrote:
 I am playing with external file field for sorting.  I created a dynamic 
 field using the ExternalFileField type.
 
 I naively assumed that the fl argument would allow me to return the value 
 the external field but doesnt seem to do so.
 
 For instance I have a defined a dynamic field:
 
 *_efloat
 
 then I used:
 
 sort=foo_efloat desc
 fl=foo_efloat, score, description
 
 I get the score and description but the foo_efloat seems to be missing in 
 action.
 
 
 Thoughts?
 
 C
 
 



Re: SolrCloud leader

2013-07-14 Thread Jack Krupansky
In theory, each of the nodes uses the same configuration, right? So, in 
theory, ANY of the nodes can do a DIH import. It is only way down low in the 
update processing chain that an individual Solr input document needs to have 
its key hashed and then the request is routed to the leader of the 
appropriate shard.


In short, YOU decide whatever node that YOU want the DIH import to run on, 
and Solr will automatically take care of actual distribution of individual 
document update requests.


If you want to pick a leader node, fine, but there is no requirement or need 
that you do so.


Scheduling is currently outside of the scope of Solr and SolrCloud.

-- Jack Krupansky

-Original Message- 
From: kowish.adamosh

Sent: Sunday, July 14, 2013 8:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader

The problem is that I don't want to invoke data import on 8 server nodes but
to choose only one for scheduling. Of course if this server will shut down
then another one needs to take the scheduler role. I can see that there is
task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I hope
they will take into account SolrCloud. And that's why I wanted to know if
current node is *currently* elected as the leader. The leader would be the
scheduler.

In the meanwhile, any ideas of how to solve data import scheduling on
SolrCloud architecture?

Kowish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-leader-tp4077759p4077878.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
Right, sorry...
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/


On Sun, Jul 14, 2013 at 8:31 AM, xan p...@prateeksachan.com wrote:
 Sorry, but did you forget to send me the example's link?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856p4077877.html
 Sent from the Solr - User mailing list archive at Nabble.com.


ACL implementation: Pseudo-join performance Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello all,

Situation:
We have a collection of files in SOLR with ACL applied: each file has a
multi-valued field that contains the list of userID's that can read it:

here is sample data:
Id | content  | userId
1  | text text | 4,5,6,2
2  | text text | 4,5,9
3  | text text | 4,2

Problem:
when ACL is changed for a big folder, we compute the ACL for all child
items and reindex in SOLR using atomic updates (updating only 'userIds'
column), but because it deletes/reindexes the record, the performance is
very poor.

Question:
I suppose the delete/reindex approach will not change soon (probably it's
due to actual SOLR architecture), ?

Possible solution: assuming atomic updates will be super fast on an index
without fulltext, keep a separate ACLIndex and FullTextIndex and use
Pseudo-Joins:

Example: searching 'foo' as user '999'
/solr/FullTextIndex/select/?q=foofq{!join fromIndex=ACLIndex from=Id to=Id
}userId:999

Question: what about performance here? what if the index is 100,000
records?
notice that the worst situation is when everyone has access to all the
files, it means the first filter will be the full index.

Would be happy to get any links that deal with the issue of Pseudo-join
performance for large datasets (i.e. initial filtered set of IDs).

Regards,
Oleg

P.S. we found that having the list of all users that have access for each
record is better overall, because there are much more read requests (people
accessing the library) then write requests (a new user is added/removed).


Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Thanks for the link. Also, having gone quite far with my work using the PHP
Solr client, isn't there anything that could be done using the PHP Solr
client only?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856p4077893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-14 Thread Jack Krupansky

Take a look at LucidWorks Search and its access control:
http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control

Role-based security is an easier nut to crack.

Karl Wright of ManifoldCF had a Solr patch for document access control at 
one point:
SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCF 
security at search time

https://issues.apache.org/jira/browse/SOLR-1895

http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011

For some other thoughts:
http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

I'm not sure if external file fields will be of any value in this situation.

There is also a proposal for bitwise operations:
SOLR-1913 - QParserPlugin plugin for Search Results Filtering Based on 
Bitwise Operations on Integer Fields

https://issues.apache.org/jira/browse/SOLR-1913

But the bottom line is that clearly updating all documents in the index is a 
non-starter.


-- Jack Krupansky

-Original Message- 
From: Oleg Burlaca

Sent: Sunday, July 14, 2013 11:02 AM
To: solr-user@lucene.apache.org
Subject: ACL implementation: Pseudo-join performance  Atomic Updates

Hello all,

Situation:
We have a collection of files in SOLR with ACL applied: each file has a
multi-valued field that contains the list of userID's that can read it:

here is sample data:
Id | content  | userId
1  | text text | 4,5,6,2
2  | text text | 4,5,9
3  | text text | 4,2

Problem:
when ACL is changed for a big folder, we compute the ACL for all child
items and reindex in SOLR using atomic updates (updating only 'userIds'
column), but because it deletes/reindexes the record, the performance is
very poor.

Question:
I suppose the delete/reindex approach will not change soon (probably it's
due to actual SOLR architecture), ?

Possible solution: assuming atomic updates will be super fast on an index
without fulltext, keep a separate ACLIndex and FullTextIndex and use
Pseudo-Joins:

Example: searching 'foo' as user '999'
/solr/FullTextIndex/select/?q=foofq{!join fromIndex=ACLIndex from=Id to=Id
}userId:999

Question: what about performance here? what if the index is 100,000
records?
notice that the worst situation is when everyone has access to all the
files, it means the first filter will be the full index.

Would be happy to get any links that deal with the issue of Pseudo-join
performance for large datasets (i.e. initial filtered set of IDs).

Regards,
Oleg

P.S. we found that having the list of all users that have access for each
record is better overall, because there are much more read requests (people
accessing the library) then write requests (a new user is added/removed). 



Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
I'm completely ignorant of all things PHP, including the
state of any Solr client code, so I'm afraid I can't
help with that...

Best
Erick

On Sun, Jul 14, 2013 at 11:03 AM, xan p...@prateeksachan.com wrote:
 Thanks for the link. Also, having gone quite far with my work using the PHP
 Solr client, isn't there anything that could be done using the PHP Solr
 client only?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856p4077893.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-14 Thread Erick Erickson
Join performance is most sensitive to the number of values
in the field being joined on. So if you have lots and lots of
distinct values in the corpus, join performance will be affected.

bq: I suppose the delete/reindex approach will not change soon

There is ongoing work (search the JIRA for Stacked Segments)
on actually doing something about this, but it's been under consideration
for at least 3 years so your guess is as good as mine.

bq: notice that the worst situation is when everyone has access to all the
files, it means the first filter will be the full index.

One way to deal with this is to implement a post filter, sometimes called
a no cache filter. The distinction here is that
1 it is not cached (duh!)
2 it is only called for documents that have made it through all the
 other lower cost filters (and the main query of course).
3 lower cost means the filter is either a standard, cached filters
and any no cache filters with a cost (explicitly stated in the query)
lower than this one's.

Critically, and unlike normal filter queries, the result set is NOT
calculated for all documents ahead of time

You _still_ have to deal with the sysadmin doing a *:* query as you
are well aware. But one can mitigate that by having the post-filter
fail all documents after some arbitrary N, and display a message in the
app like too many documents, man. Please refine your query. Partial
results below. Of course this may not be acceptable, but

HTH
Erick

On Sun, Jul 14, 2013 at 12:05 PM, Jack Krupansky
j...@basetechnology.com wrote:
 Take a look at LucidWorks Search and its access control:
 http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control

 Role-based security is an easier nut to crack.

 Karl Wright of ManifoldCF had a Solr patch for document access control at
 one point:
 SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCF
 security at search time
 https://issues.apache.org/jira/browse/SOLR-1895

 http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011

 For some other thoughts:
 http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

 I'm not sure if external file fields will be of any value in this situation.

 There is also a proposal for bitwise operations:
 SOLR-1913 - QParserPlugin plugin for Search Results Filtering Based on
 Bitwise Operations on Integer Fields
 https://issues.apache.org/jira/browse/SOLR-1913

 But the bottom line is that clearly updating all documents in the index is a
 non-starter.

 -- Jack Krupansky

 -Original Message- From: Oleg Burlaca
 Sent: Sunday, July 14, 2013 11:02 AM
 To: solr-user@lucene.apache.org
 Subject: ACL implementation: Pseudo-join performance  Atomic Updates


 Hello all,

 Situation:
 We have a collection of files in SOLR with ACL applied: each file has a
 multi-valued field that contains the list of userID's that can read it:

 here is sample data:
 Id | content  | userId
 1  | text text | 4,5,6,2
 2  | text text | 4,5,9
 3  | text text | 4,2

 Problem:
 when ACL is changed for a big folder, we compute the ACL for all child
 items and reindex in SOLR using atomic updates (updating only 'userIds'
 column), but because it deletes/reindexes the record, the performance is
 very poor.

 Question:
 I suppose the delete/reindex approach will not change soon (probably it's
 due to actual SOLR architecture), ?

 Possible solution: assuming atomic updates will be super fast on an index
 without fulltext, keep a separate ACLIndex and FullTextIndex and use
 Pseudo-Joins:

 Example: searching 'foo' as user '999'
 /solr/FullTextIndex/select/?q=foofq{!join fromIndex=ACLIndex from=Id to=Id
 }userId:999

 Question: what about performance here? what if the index is 100,000
 records?
 notice that the worst situation is when everyone has access to all the
 files, it means the first filter will be the full index.

 Would be happy to get any links that deal with the issue of Pseudo-join
 performance for large datasets (i.e. initial filtered set of IDs).

 Regards,
 Oleg

 P.S. we found that having the list of all users that have access for each
 record is better overall, because there are much more read requests (people
 accessing the library) then write requests (a new user is added/removed).


Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread Shawn Heisey
On 7/14/2013 6:43 AM, PeterKerk wrote:
 Hi Shawn,
 
 I'm also getting the HTTP Status 503 - Server is shutting down error when
 navigating to http://localhost:8080/solr-4.3.1/

snip

 INFO: Deploying web application archive solr-4.3.1.war
 log4j:WARN No appenders could be found for logger
 (org.apache.solr.servlet.SolrD
 ispatchFilter).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more in
 fo.

THe logging.properties file is used for JDK logging, which was the
default in Solr prior to version 4.3.0.  In older versions, jarfiles
were embedded in the .war file that set up slf4j to use
java.util.logging, also known as JDK logging because this logging
framework comes with Java.

Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file,
so you need to put them in your classpath.  Jarfiles are included in the
example, in example/lib/ext, and those jarfiles set up logging to use
log4j, a much more flexible logging framework than JDK logging.

JDK logging is typically set up with a file called logging.properties,
which I think you must use a system property to configure.  You aren't
using JDK logging, you are using log4j, which uses a file called
log4j.properties.

http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty

It appears that you have followed part of the instructions above and
copied jars from example/lib/ext to a lib directory on your classpath.
Now if you copy example/resources/log4j.properties to the same place,
logging should work.  It will not log to the tomcat log, it will log to
the location specified in log4j.properties, which by default is
logs/solr.log relative to the current working directory.

As I already said on this thread, if you want Tomcat to be in control of
the logging, you must switch back to java.util.logging as described in
the wiki:

http://wiki.apache.org/solr/SolrLogging#Switching_from_Log4J_back_to_JUL_.28java.util.logging.29

Thanks,
Shawn



Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello Jack,

Thanks for so many links, my comments are below, I'll found a way to
rephrase all my questions in one:
How to implement a DAC (Discretionary Access Control) similar to Windows OS
using SOLR?

What we have: a hierarchical filesystem, user and groups, permissions
applied at the level of a file/folder.
What we need: full-text search  restricting access based on ACL.
How to deal with a change in permissions for a big folder?
How to check if the user can delete a folder?  (it means he should have
write access to all files/sub-folders)


 Role-based security is an easier nut to crack
yep, but we need DAC :(

 http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control
The documentation doesn't reveal what happens when content should be
reindexed, although the last chapter Document-based Authorization shows
the same approach: user list specified at the level of the document.

 Karl Wright of ManifoldCF had a Solr patch for document access control at
one point:
 SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCF
security at search time
 https://issues.apache.org/**jira/browse/SOLR-1895https://issues.apache.org/jira/browse/SOLR-1895
It states LCF SearchComponent which filters returned results based on
access tokens provided by LCF's authority service
That means filtering is applied on the results only.
Issues: faceting doesn't work correctly (i.e. counting), because the filter
isn't applied yet.
Even worse: you have to scroll through the result set until you find
records accessible by the user (what if the user has access to 10 from
100,000 files?)

 http://www.slideshare.net/**lucenerevolution/wright-nokia-**
manifoldcfeurocon-2011http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
Page 9 says docs and access tokens.
Separate bins for allow tokens, deny tokens for file 
It's similar to the approach we use: each record in SOLR has two fields:
readAccess and WriteAccess both is a multivalued field with userId's.
it allows us to quickly delete a bunch of items the user has access to for
ex. (or checking a hierarchical delete)

 http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security 
 http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
It works by adding security tokens from the source repositories as
metadata on the indexed documents
Again, the permission info is stored within the record itself, and if we
change access for big folder, it means reindexing.

 https://issues.apache.org/**jira/browse/SOLR-1913https://issues.apache.org/jira/browse/SOLR-1913
Thanks for the link, need to meditate if I can find a way to use it.

 But the bottom line is that clearly updating all documents in the index
is a non-starter.
I have scratched my head, and monitoring SOLR features for a long time,
trying to find something I can use. Today I've watched Yonik Seeley video:
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55387447
and found PSEUDO-JOINS, nice This seems a perfect solution, I can have
two indexes, one with full-text and another one with objId and userId's,
the second one should be fast to update I hope.

But the question is: what about performance?

Regards




On Sun, Jul 14, 2013 at 7:05 PM, Jack Krupansky j...@basetechnology.comwrote:

 Take a look at LucidWorks Search and its access control:
 http://docs.lucidworks.com/**display/help/Search+Filters+**
 for+Access+Controlhttp://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control

 Role-based security is an easier nut to crack.

 Karl Wright of ManifoldCF had a Solr patch for document access control at
 one point:
 SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCF
 security at search time
 https://issues.apache.org/**jira/browse/SOLR-1895https://issues.apache.org/jira/browse/SOLR-1895

 http://www.slideshare.net/**lucenerevolution/wright-nokia-**
 manifoldcfeurocon-2011http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011

 For some other thoughts:
 http://wiki.apache.org/solr/**SolrSecurity#Document_Level_**Securityhttp://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

 I'm not sure if external file fields will be of any value in this
 situation.

 There is also a proposal for bitwise operations:
 SOLR-1913 - QParserPlugin plugin for Search Results Filtering Based on
 Bitwise Operations on Integer Fields
 https://issues.apache.org/**jira/browse/SOLR-1913https://issues.apache.org/jira/browse/SOLR-1913

 But the bottom line is that clearly updating all documents in the index is
 a non-starter.

 -- Jack Krupansky

 -Original Message- From: Oleg Burlaca
 Sent: Sunday, July 14, 2013 11:02 AM
 To: solr-user@lucene.apache.org
 Subject: ACL implementation: Pseudo-join performance  Atomic Updates


 Hello all,

 Situation:
 We have a collection of files in SOLR with ACL applied: each file has a
 multi-valued field that contains the list of userID's that can read it:

 here is 

Re: SolrCloud leader

2013-07-14 Thread Shawn Heisey
On 7/14/2013 6:42 AM, kowish.adamosh wrote:
 The problem is that I don't want to invoke data import on 8 server nodes but
 to choose only one for scheduling. Of course if this server will shut down
 then another one needs to take the scheduler role. I can see that there is
 task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I hope
 they will take into account SolrCloud. And that's why I wanted to know if
 current node is *currently* elected as the leader. The leader would be the
 scheduler.
 
 In the meanwhile, any ideas of how to solve data import scheduling on
 SolrCloud architecture?

As Jack already replied, this is outside the scope of Solr.

SOLR-2305 has been around for a VERY long time.  Adding scheduling
capability to the dataimport handler is not very hard, but nobody has
done so because we do not believe this is something Solr should be
handling.  Also, it's easy to get something wrong, so users can run into
bugs that would break their scheduling.

Every operating system has scheduling capability.  Windows has the task
scheduler.  On virtually all other operating systems, you'll find cron.
 These systems have had years of operation for their authors to work out
the bugs, and they are VERY solid.

We would not be able to make the same robustness guarantee if we
included scheduling in Solr.  Additionally, we really want to be sure
that Solr never does anything on its own that has not been specifically
requested by a user or program, or through certain external events such
as a hardware or software failure.

For my own multi-server Linux Solr installation, which doesn't use
SolrCloud even though it's got two complete copies of the index and uses
shards, I have worked out how to do clustered scheduling.  I have a
corosync/pacemaker cluster set up on my servers, which ensures that only
one copy of my cronjobs is running on the cluster.  If a server dies, it
will start up the cronjobs on another server.

Thanks,
Shawn



Re: external file field and fl parameter

2013-07-14 Thread Shawn Heisey
On 7/14/2013 7:05 AM, Chris Collins wrote:
 Yep I did switch on stored=true in the field type.  I was able to confirm a 
 few ways that there are values for the eff by two methods:
 
 1) changing desc to asc produced drastically different results.
 
 2) debugging FileFloatSource the following was getting triggered filling the 
 vals array:
   while ((doc = docsEnum.nextDoc()) != 
 DocIdSetIterator.NO_MORE_DOCS)
 {
 vals[doc] = fval;
 }
 
 At least by you asking these questions I guess it should work.  I will 
 continue dissecting. 

Did you reindex when you changed the schema?  Sorting uses indexed
values, not stored values.  The fl parameter requires the stored values.
 These are separate within the index, and one cannot substitute for the
other.  If you didn't reindex, then you won't have the stored values for
existing documents.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
I have written my own plugin for Apache Nutch 2.2.1 to crawl images, videos
and podcasts from selected sites (I have 180 urls in my seed). I put this
metadata to a hBase store and now I want to save it to the index (Solr). I
have a lot of metadatas to save (webpages + images + videos + podcast).

I am using Nutch script bin/crawl for the whole process (inject, generate,
fetch, parse... and finally solrindex and dedup) but I have one problem.
When I run this script for a first time, there are stored approximately 6000
documents (Lets say it is 3700 docs for images, 1700 for wegpages and the
rest of docs are for videos and podcasts) to the index. It is ok...

but...

When I run the script for a second time, third time and so on... the index
does not increase the number of documents (there are still 6000 documents)
but a count of rows stored in hBase table grows (there is 97383 rows now)...

Do you now where is the problem please? I am fighting with this problem
really long time and I dont know... If it could be helpful, this is my
configuration of solrconfix.xml http://pastebin.com/uxMW2nuq and this is my
nutch-site.xml http://pastebin.com/4bj1wdmT



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Why would I be re-indexing an external file field? The whole purpose is that 
its brought in at runtime and not part of the index?

C
On Jul 14, 2013, at 10:13 AM, Shawn Heisey s...@elyograg.org wrote:

 On 7/14/2013 7:05 AM, Chris Collins wrote:
 Yep I did switch on stored=true in the field type.  I was able to confirm a 
 few ways that there are values for the eff by two methods:
 
 1) changing desc to asc produced drastically different results.
 
 2) debugging FileFloatSource the following was getting triggered filling the 
 vals array:
  while ((doc = docsEnum.nextDoc()) != 
 DocIdSetIterator.NO_MORE_DOCS)
{
vals[doc] = fval;
}
 
 At least by you asking these questions I guess it should work.  I will 
 continue dissecting. 
 
 Did you reindex when you changed the schema?  Sorting uses indexed
 values, not stored values.  The fl parameter requires the stored values.
 These are separate within the index, and one cannot substitute for the
 other.  If you didn't reindex, then you won't have the stored values for
 existing documents.
 
 http://wiki.apache.org/solr/HowToReindex
 
 Thanks,
 Shawn
 
 



Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello Erick,

 Join performance is most sensitive to the number of values
 in the field being joined on. So if you have lots and lots of
 distinct values in the corpus, join performance will be affected.
Yep, we have a list of unique Id's that we get by first searching for
records
where loggedInUser IS IN (userIDs)
This corpus is stored in memory I suppose? (not a problem) and then the
bottleneck is to match this huge set with the core where I'm searching?

Somewhere in maillist archive people were talking about external list of
Solr unique IDs
but didn't find if there is a solution.
Back in 2010 Yonik posted a comment:
http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd


 bq: I suppose the delete/reindex approach will not change soon
 There is ongoing work (search the JIRA for Stacked Segments)
Ah, ok, I was feeling it affects the architecture, ok, now the only hope is
Pseudo-Joins ))

 One way to deal with this is to implement a post filter, sometimes
called
 a no cache filter.
thanks, will have a look, but as you describe it, it's not the best option.

The approach
too many documents, man. Please refine your query. Partial results below
means faceting will not work correctly?

... I have in mind a hybrid approach, comments welcome:
Most of the time users are not searching, but browsing content, so our
virtual filesystem stored in SOLR will use only the index with the Id of
the file and the list of users that have access to it. i.e. not touching
the fulltext index at all.

Files may have metadata (EXIF info for images for ex) that we'd like to
filter by, calculate facets.
Meta will be stored in both indexes.

In case of a fulltext query:
1. search FT index (the fulltext index), get only the number of search
results, let it be Rf
2. search DAC index (the index with permissions), get number of search
results, let it be Rd

let maxR be the maximum size of the corpus for the pseudo-join.
*That was actually my question: what is a reasonable number? 10, 100, 1000 ?
*

if (Rf  maxR) or (Rd  maxR) then use the smaller corpus to join onto the
second one.
this happens when (only a few documents contains the search query) OR (user
has access to a small number of files).

In case none of these happens, we can use the
too many documents, man. Please refine your query. Partial results below
but first searching the FT index, because we want relevant results first.

What do you think?

Regards,
Oleg




On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson erickerick...@gmail.comwrote:

 Join performance is most sensitive to the number of values
 in the field being joined on. So if you have lots and lots of
 distinct values in the corpus, join performance will be affected.

 bq: I suppose the delete/reindex approach will not change soon

 There is ongoing work (search the JIRA for Stacked Segments)
 on actually doing something about this, but it's been under consideration
 for at least 3 years so your guess is as good as mine.

 bq: notice that the worst situation is when everyone has access to all the
 files, it means the first filter will be the full index.

 One way to deal with this is to implement a post filter, sometimes called
 a no cache filter. The distinction here is that
 1 it is not cached (duh!)
 2 it is only called for documents that have made it through all the
  other lower cost filters (and the main query of course).
 3 lower cost means the filter is either a standard, cached filters
 and any no cache filters with a cost (explicitly stated in the query)
 lower than this one's.

 Critically, and unlike normal filter queries, the result set is NOT
 calculated for all documents ahead of time

 You _still_ have to deal with the sysadmin doing a *:* query as you
 are well aware. But one can mitigate that by having the post-filter
 fail all documents after some arbitrary N, and display a message in the
 app like too many documents, man. Please refine your query. Partial
 results below. Of course this may not be acceptable, but

 HTH
 Erick

 On Sun, Jul 14, 2013 at 12:05 PM, Jack Krupansky
 j...@basetechnology.com wrote:
  Take a look at LucidWorks Search and its access control:
 
 http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control
 
  Role-based security is an easier nut to crack.
 
  Karl Wright of ManifoldCF had a Solr patch for document access control at
  one point:
  SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCF
  security at search time
  https://issues.apache.org/jira/browse/SOLR-1895
 
 
 http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
 
  For some other thoughts:
  http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
 
  I'm not sure if external file fields will be of any value in this
 situation.
 
  There is also a proposal for bitwise operations:
  SOLR-1913 - QParserPlugin plugin for Search Results Filtering Based on
  Bitwise Operations on Integer Fields
  

Re: external file field and fl parameter

2013-07-14 Thread Alan Woodward
Hi Chris,

Try wrapping the field name in a field() function in your fl parameter list, 
like so:
fl=field(eff_field_name)

Alan Woodward
www.flax.co.uk


On 14 Jul 2013, at 18:41, Chris Collins wrote:

 Why would I be re-indexing an external file field? The whole purpose is that 
 its brought in at runtime and not part of the index?
 
 C
 On Jul 14, 2013, at 10:13 AM, Shawn Heisey s...@elyograg.org wrote:
 
 On 7/14/2013 7:05 AM, Chris Collins wrote:
 Yep I did switch on stored=true in the field type.  I was able to confirm a 
 few ways that there are values for the eff by two methods:
 
 1) changing desc to asc produced drastically different results.
 
 2) debugging FileFloatSource the following was getting triggered filling 
 the vals array:
 while ((doc = docsEnum.nextDoc()) != 
 DocIdSetIterator.NO_MORE_DOCS)
   {
   vals[doc] = fval;
   }
 
 At least by you asking these questions I guess it should work.  I will 
 continue dissecting. 
 
 Did you reindex when you changed the schema?  Sorting uses indexed
 values, not stored values.  The fl parameter requires the stored values.
 These are separate within the index, and one cannot substitute for the
 other.  If you didn't reindex, then you won't have the stored values for
 existing documents.
 
 http://wiki.apache.org/solr/HowToReindex
 
 Thanks,
 Shawn
 
 
 



Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Yes that worked, thanks Alan.  The consistency of this api is challenging.

C
On Jul 14, 2013, at 11:03 AM, Alan Woodward a...@flax.co.uk wrote:

 Hi Chris,
 
 Try wrapping the field name in a field() function in your fl parameter list, 
 like so:
 fl=field(eff_field_name)
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 14 Jul 2013, at 18:41, Chris Collins wrote:
 
 Why would I be re-indexing an external file field? The whole purpose is that 
 its brought in at runtime and not part of the index?
 
 C
 On Jul 14, 2013, at 10:13 AM, Shawn Heisey s...@elyograg.org wrote:
 
 On 7/14/2013 7:05 AM, Chris Collins wrote:
 Yep I did switch on stored=true in the field type.  I was able to confirm 
 a few ways that there are values for the eff by two methods:
 
 1) changing desc to asc produced drastically different results.
 
 2) debugging FileFloatSource the following was getting triggered filling 
 the vals array:
while ((doc = docsEnum.nextDoc()) != 
 DocIdSetIterator.NO_MORE_DOCS)
  {
  vals[doc] = fval;
  }
 
 At least by you asking these questions I guess it should work.  I will 
 continue dissecting. 
 
 Did you reindex when you changed the schema?  Sorting uses indexed
 values, not stored values.  The fl parameter requires the stored values.
 These are separate within the index, and one cannot substitute for the
 other.  If you didn't reindex, then you won't have the stored values for
 existing documents.
 
 http://wiki.apache.org/solr/HowToReindex
 
 Thanks,
 Shawn
 
 
 
 



Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
When I look into the log, there is:

SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
hit an OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2668)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2834)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2814)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:529)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913p4077924.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread PeterKerk
Ok, still getting the same error HTTP Status 503 - Server is shutting down,
so here's what I did now:

- reinstalled tomcat
- deployed solr-4.3.1.war in C:\Program Files\Apache Software
Foundation\Tomcat 6.0\webapps
- copied log4j-1.2.16.jar,slf4j-api-1.6.6.jar,slf4j-log4j12-1.6.6.jar to
C:\Program Files\Apache Software Foundation\Tomcat
6.0\webapps\solr-4.3.1\WEB-INF\lib
- copied log4j.properties from
C:\Dropbox\Databases\solr-4.3.1\example\resources to
C:\Dropbox\Databases\solr-4.3.1\example\lib
- restarted tomcat


Now this shows in my Tomcat console:

14-jul-2013 20:54:38 org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performanc
e in production environments was not found on the java.library.path:
C:\Program
Files\Apache Software Foundation\Tomcat
6.0\bin;C:\Windows\Sun\Java\bin;C:\Windo
ws\system32;C:\Windows;C:\Program Files\Common Files\Microsoft
Shared\Windows Li
ve;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows
Live;C:\Windows\
system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
ll\v1.0\;C:\Program Files\TortoiseSVN\bin;c:\msxsl;C:\Program Files
(x86)\Window
s Live\Shared;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program
File
s (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\;C:\Program Files
(x86)\Windows
 Kits\8.0\Windows Performance Toolkit\;C:\Program Files\Microsoft SQL
Server\110
\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL
Server\110\Tools\Binn\;C:\Prog
ram Files\Microsoft SQL Server\110\DTS\Binn\;C:\Program Files
(x86)\Microsoft SQ
L Server\110\Tools\Binn\ManagementStudio\;C:\Program Files (x86)\Microsoft
SQL S
erver\110\DTS\Binn\;C:\Program Files (x86)\Java\jre6\bin;C:\Program
Files\Java\j
re631\bin;.
14-jul-2013 20:54:39 org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
14-jul-2013 20:54:39 org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 287 ms
14-jul-2013 20:54:39 org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
14-jul-2013 20:54:39 org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.37
14-jul-2013 20:54:39 org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor manager.xml
14-jul-2013 20:54:39 org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive solr-4.3.1.war
log4j:WARN No appenders could be found for logger
(org.apache.solr.servlet.SolrD
ispatchFilter).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more in
fo.
14-jul-2013 20:54:39 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory ROOT
14-jul-2013 20:54:39 org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
14-jul-2013 20:54:39 org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
14-jul-2013 20:54:39 org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/55  config=null
14-jul-2013 20:54:39 org.apache.catalina.startup.Catalina start
INFO: Server startup in 732 ms

And this in the catalina.log:

14-jul-2013 20:54:38 org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path: C:\Program Files\Apache Software Foundation\Tomcat
6.0\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program
Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files
(x86)\Common Files\Microsoft Shared\Windows
Live;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
Files\TortoiseSVN\bin;c:\msxsl;C:\Program Files (x86)\Windows
Live\Shared;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program
Files (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\;C:\Program Files
(x86)\Windows Kits\8.0\Windows Performance Toolkit\;C:\Program
Files\Microsoft SQL Server\110\Tools\Binn\;C:\Program Files (x86)\Microsoft
SQL Server\110\Tools\Binn\;C:\Program Files\Microsoft SQL
Server\110\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL
Server\110\Tools\Binn\ManagementStudio\;C:\Program Files (x86)\Microsoft SQL
Server\110\DTS\Binn\;C:\Program Files (x86)\Java\jre6\bin;C:\Program
Files\Java\jre631\bin;.
14-jul-2013 20:54:39 org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
14-jul-2013 20:54:39 org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 287 ms
14-jul-2013 20:54:39 org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
14-jul-2013 20:54:39 org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.37
14-jul-2013 20:54:39 org.apache.catalina.startup.HostConfig deployDescriptor
INFO: 

Re: solr autodetectparser tikaconfig dataimporter error

2013-07-14 Thread Andreas Owen
hi

is there nowone with a idea what this error is or even give me a pointer where 
to look? If not is there a alternitave way to import documents from a xml-file 
with meta-data and the filename to parse?

thanks for any help.


On 12. Jul 2013, at 10:38 PM, Andreas Owen wrote:

 i am using solr 3.5, tika-app-1.4 and tagcloud 1.2.1. when i try to =
 import a
 file via xml i get this error, it doesn't matter what file format i try =
 to index txt, cfm, pdf all the same error:
 
 SEVERE: Exception while processing: rec document :
 SolrInputDocument[{id=3Did(1.0)=3D{myTest.txt},
 title=3Dtitle(1.0)=3D{Beratungsseminar kundenbrief}, =
 contents=3Dcontents(1.0)=3D{wie
 kommuniziert man}, author=3Dauthor(1.0)=3D{Peter Z.},
 =
 path=3Dpath(1.0)=3D{download/online}}]:org.apache.solr.handler.dataimport.=
 DataImportHandlerException:
 java.lang.NoSuchMethodError:
 =
 org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
 TikaConfig;)V
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:669)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:622)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
 68)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=
 
   at
 =
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
 java:359)
   at
 =
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
 27)
   at
 =
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
 8)
 Caused by: java.lang.NoSuchMethodError:
 =
 org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
 TikaConfig;)V
   at
 =
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
 rocessor.java:122)
   at
 =
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
 ocessorWrapper.java:238)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:596)
   ... 6 more
 
 Jul 11, 2013 5:23:36 PM org.apache.solr.common.SolrException log
 SEVERE: Full Import
 failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.NoSuchMethodError:
 =
 org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
 TikaConfig;)V
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:669)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:622)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
 68)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=
 
   at
 =
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
 java:359)
   at
 =
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
 27)
   at
 =
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
 8)
 Caused by: java.lang.NoSuchMethodError:
 =
 org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
 TikaConfig;)V
   at
 =
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
 rocessor.java:122)
   at
 =
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
 ocessorWrapper.java:238)
   at
 =
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
 a:596)
   ... 6 more
 
 Jul 11, 2013 5:23:36 PM org.apache.solr.update.DirectUpdateHandler2 =
 rollback
 
 data-config.xml:
 dataConfig
   dataSource type=3DBinURLDataSource name=3Ddata/
   dataSource type=3DURLDataSource =
 baseUrl=3Dhttp://127.0.0.1/tkb/internet/;
 name=3Dmain/
 document
   entity name=3Drec processor=3DXPathEntityProcessor =
 url=3DdocImport.xml
 forEach=3D/albums/album dataSource=3Dmain=20
   field column=3Dtitle xpath=3D//title /
   field column=3Did xpath=3D//file /
   field column=3Dcontents xpath=3D//description /
   field column=3Dpath xpath=3D//path /
   field column=3DAuthor xpath=3D//author /
   =09
   =09
   =09
   entity processor=3DTikaEntityProcessor
 =
 url=3Dfile:///C:\web\development\tkb\internet\public\download\online\${re=
 c.id}
 dataSource=3Ddata onerror=3Dskip
field column=3Dcontents name=3Dtext /
   /entity
   /entity
 /document
 /dataConfig
 
 the lib are included and declared in the logs, i have also tried =
 tika-app
 1.0 and tagsoup 1.2 with the same result. can someone please help, i =
 don't
 know where to start looking for the error.



Re: solr autodetectparser tikaconfig dataimporter error

2013-07-14 Thread Jack Krupansky

Caused by: java.lang.NoSuchMethodError:

That means you have some out of date jars or some newer jars mixed in with 
the old ones.


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Sunday, July 14, 2013 3:07 PM
To: solr-user@lucene.apache.org
Subject: Re: solr autodetectparser tikaconfig dataimporter error

hi

is there nowone with a idea what this error is or even give me a pointer 
where to look? If not is there a alternitave way to import documents from a 
xml-file with meta-data and the filename to parse?


thanks for any help.


On 12. Jul 2013, at 10:38 PM, Andreas Owen wrote:


i am using solr 3.5, tika-app-1.4 and tagcloud 1.2.1. when i try to =
import a
file via xml i get this error, it doesn't matter what file format i try =
to index txt, cfm, pdf all the same error:

SEVERE: Exception while processing: rec document :
SolrInputDocument[{id=3Did(1.0)=3D{myTest.txt},
title=3Dtitle(1.0)=3D{Beratungsseminar kundenbrief}, =
contents=3Dcontents(1.0)=3D{wie
kommuniziert man}, author=3Dauthor(1.0)=3D{Peter Z.},
=
path=3Dpath(1.0)=3D{download/online}}]:org.apache.solr.handler.dataimport.=
DataImportHandlerException:
java.lang.NoSuchMethodError:
=
org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
TikaConfig;)V
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:669)
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:622)
at
=
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
68)
at
=
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=

at
=
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
java:359)
at
=
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
27)
at
=
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
8)
Caused by: java.lang.NoSuchMethodError:
=
org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
TikaConfig;)V
at
=
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
rocessor.java:122)
at
=
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
ocessorWrapper.java:238)
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:596)
... 6 more

Jul 11, 2013 5:23:36 PM org.apache.solr.common.SolrException log
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
=
org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
TikaConfig;)V
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:669)
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:622)
at
=
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
68)
at
=
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=

at
=
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
java:359)
at
=
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
27)
at
=
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
8)
Caused by: java.lang.NoSuchMethodError:
=
org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
TikaConfig;)V
at
=
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
rocessor.java:122)
at
=
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
ocessorWrapper.java:238)
at
=
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
a:596)
... 6 more

Jul 11, 2013 5:23:36 PM org.apache.solr.update.DirectUpdateHandler2 =
rollback

data-config.xml:
dataConfig
dataSource type=3DBinURLDataSource name=3Ddata/
dataSource type=3DURLDataSource =
baseUrl=3Dhttp://127.0.0.1/tkb/internet/;
name=3Dmain/
document
entity name=3Drec processor=3DXPathEntityProcessor =
url=3DdocImport.xml
forEach=3D/albums/album dataSource=3Dmain=20
field column=3Dtitle xpath=3D//title /
field column=3Did xpath=3D//file /
field column=3Dcontents xpath=3D//description /
field column=3Dpath xpath=3D//path /
field column=3DAuthor xpath=3D//author /
=09
=09
=09
entity processor=3DTikaEntityProcessor
=
url=3Dfile:///C:\web\development\tkb\internet\public\download\online\${re=
c.id}
dataSource=3Ddata onerror=3Dskip
field column=3Dcontents name=3Dtext /
/entity
/entity
/document
/dataConfig

the lib are included and declared in the logs, i have also tried =
tika-app
1.0 and tagsoup 1.2 with the same result. can someone please help, i =
don't
know where to start looking for the error. 




Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread Erick Erickson
Well, that's one. OutOfMemoryErrors will stop things from happening
for sure, the cure is to give the JVM more memory.

Additionally, multiple update of a doc with the same uniqueKey
will replace the old copy with a new one, that might be what you're
seeing.

But get rid of the OOM first.

Best
Erick

On Sun, Jul 14, 2013 at 2:40 PM, glumet jan.bouch...@gmail.com wrote:
 When I look into the log, there is:

 SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
 hit an OutOfMemoryError; cannot commit
 at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2668)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2834)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2814)
 at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:529)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913p4077924.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr caching clarifications

2013-07-14 Thread Manuel Le Normand
Alright, thanks Erick. For the question about memory usage of merges, taken
from  Mike McCandless Blog

The big thing that stays in RAM is a logical int[] mapping old docIDs to
new docIDs, but in more recent versions of Lucene (4.x) we use a much more
efficient structure than a simple int[] ... see
https://issues.apache.org/jira/browse/LUCENE-2357

How much RAM is required is mostly a function of how many documents (lots
of tiny docs use more RAM than fewer huge docs).


A related clarification
As my users are not aware of the fq possibility, i was wondering how do I
make the best out of this field cache. Would if be efficient transforming
implicitly their query to a filter query on fields that are boolean
searches (date range etc. that do not affect the score of a document). Is
this a good practice? Is there any plugin for a query parser that makes it?




 Inline

 On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
 manuel.lenorm...@gmail.com wrote:
  Hello,
  As a result of frequent java OOM exceptions, I try to investigate more
into
  the solr jvm memory heap usage.
  Please correct me if I am mistaking, this is my understanding of usages
for
  the heap (per replica on a solr instance):
  1. Buffers for indexing - bounded by ramBufferSize
  2. Solr caches
  3. Segment merge
  4. Miscellaneous- buffers for Tlogs, servlet overhead etc.
 
  Particularly I'm concerned by Solr caches and segment merges.
  1. How much memory consuming (bytes per doc) are FilterCaches
(bitDocSet)
  and queryResultCaches (DocList)? I understand it is related to the skip
  spaces between doc id's that match (so it's not saved as a bitmap). But
  basically, is every id saved as a java int?

 Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
 can get the maxDoc number from your Solr admin page). Plus some overhead
 for storing the fq text, but that's usually not much. This is for each
 entry up to Size.




 queryResultCache is usually trivial unless you've configured it
extravagantly.
 It's the query string length + queryResultWindowSize integers per entry
 (queryResultWindowSize is from solrconfig.xml).

  2. QueryResultMaxDocsCached - (for example = 100) means that any query
  resulting in more than 100 docs will not be cached (at all) in the
  queryResultCache? Or does it have to do with the documentCache?
 It's just a limit on the queryResultCache entry size as far as I can
 tell. But again
 this cache is relatively small, I'd be surprised if it used
 significant resources.

  3. DocumentCache - written on the wiki it should be greater than
  max_results*concurrent_queries. Max result is just the num of rows
  displayed (rows-start) param, right? Not the queryResultWindow.

 Yes. This a cache (I think) for the _contents_ of the documents you'll
 be returning to be manipulated by various components during the life
 of the query.

  4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
  cache be used? (on the expense of eviction of docs that were already
loaded
  with stored fields)

 Not sure, but I don't think this will contribute much to memory pressure.
This
 is about now many fields are loaded to get a single value from a doc in
the
 results list, and since one is usually working with 20 or so docs this
 is usually
 a small amount of memory.

  5. How large is the heap used by mergings? Assuming we have a merge of
10
  segments of 500MB each (half inverted files - *.pos *.doc etc, half non
  inverted files - *.fdt, *.tvd), how much heap should be left unused for
  this merge?

 Again, I don't think this is much of a memory consumer, although I
 confess I don't
 know the internals. Merging is mostly about I/O.

 
  Thanks in advance,
  Manu

 But take a look at the admin page, you can see how much memory various
 caches are using by looking at the plugins/stats section.

 Best
 Erick


Re: How to from solr facet exclude specific “Tag”!

2013-07-14 Thread Upayavira
Make your two fq clauses separate fq params? Would be better for your
caches, and would mean the tag is easily associated with the whole fq
querystring.

Upayavira

On Sun, Jul 14, 2013, at 03:14 AM, 张智 wrote:
 solr 4.3
 
 this is my query request params:
 
 lst name=responseHeaderint name=status0/intint
 name=QTime15/intlst name=paramsstr name=facettrue/strstr
 name=indenttrue/strstr name=q*:*/strstr
 name=_1373713374569/strarr
 name=facet.fieldstr{!ex=city}CityId/strstr{!ex=company}CompanyId/str/arrstr
 name=wtxml/strstr name=fq{!tag=city}CityId:729 AND
 {!tag=company}CompanyId:16122/str/lst/lst
 
 This is the query response Facet content:
 
 lst name=facet_countslst name=facet_queries/lst
 name=facet_fieldslst name=CityIdint name=11100171/intint
 name=140489406/intint name=077477/intint
 name=136665780/intint name=136258092/intint
 name=72929213/intint name=79828975/int...int
 name=7262808/intint name=432776/intint
 name=1146772/intint name=1653765/intint
 name=1078668/intint name=814667/intint
 name=2049402/intint name=456401/intint
 name=401390/int/lstlst name=CompanyIdint
 name=16122971/intint name=690/intint name=710/intint
 name=720/intint name=790/intint name=800/intint
 name=850/intint name=880/intint name=940/int...int
 name=980/intint name=1040/intint name=1120/intint
 name=1130/intint name=1180/intint name=1230/intint
 name=1260/intint name=1310/intint name=1360/intint
 name=1390/int/lst/lstlst
   name=facet_dates/lst name=facet_ranges//lst
 
 You can see CityId the Facet is correct, it excludes the {! Tag = city}
 CityId: 729 queries , but CompanyId the Facet is not correct , he did
 not rule out {! Tag = company} CompanyId: 16122 queries. How to solve it
 ?


Re: Norms

2013-07-14 Thread Mark Miller

On Jul 10, 2013, at 4:39 AM, Daniel Collins danwcoll...@gmail.com wrote:

 QueryNorm is what I'm still trying to get to the bottom of exactly :) 

If you have not seen it, some reading from the past here…

https://issues.apache.org/jira/browse/LUCENE-1896

- Mark