RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Markus Jelsma
Strange enough, the following code gives different errors:

assertQ(
req("q", "*:*", "debug", "true", "indent", "true"), 
"//result/doc[1]/str[@name='id'][.='1']",
"//result/doc[2]/str[@name='id'][.='2']",
"//result/doc[3]/str[@name='id'][.='3']",
"//result/doc[4]/str[@name='id'][.='4']");

[searcherExecutor-47-thread-1] INFO org.apache.solr.core.SolrCore - 
[collection1] Registered new searcher Searcher@765ea6c9[collection1] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(5.3.0):C2)))}
[qtp137312325-79] INFO org.apache.solr.update.processor.LogUpdateProcessor - 
[collection1] webapp=/qc_as/x path=/update 
params={waitSearcher=true=true=false=javabin=2} 
{commit=} 0 7
[TEST-TestComponent.test-seed#[EA2ED1E118114486]] INFO 
org.apache.solr.core.SolrCore.Request - [collection1] webapp=null path=null 
params={q=*:*=true=true=xml} hits=0 status=0 QTime=7 
[TEST-TestComponent.test-seed#[EA2ED1E118114486]] ERROR 
org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: 
xpath=//result/doc[1]/str[@name='id'][.='1']
xml response was: 



  0
  7




  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*


And, when i forcfully add distrib=true, i get a NPE in SearchHandler!

[TEST-TestComponent.test-seed#[2BEFD39F89732741]] INFO 
org.apache.solr.core.SolrCore.Request - [collection1] webapp=null path=null 
params={q=*:*=true=true=true=xml} 
rid=testNode-collection1-1441359642142-0 status=500 QTime=11 
[TEST-TestComponent.test-seed#[2BEFD39F89732741]] ERROR 
org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: 
q=*:*=true=true=true=xml:java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:341)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.util.TestHarness.query(TestHarness.java:320)
at org.apache.solr.util.TestHarness.query(TestHarness.java:302)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:739)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:732)



Also, this is on Solr 5.3.0.

Many thanks!
Markus

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Friday 4th September 2015 12:07
> To: solr-user 
> Subject: Trouble making tests with BaseDistributedSearchTestCase
> 
> Hello - i am trying to create some tests using BaseDistributedSearchTestCase 
> but two errors usually appear. Consider the following test:
> 
>   @Test
>   @ShardsFixed(num = 3)
>   public void test() throws Exception {
> del("*:*");
> 
> index(id, "1", "lang", "en", "text", "this is some text");
> index(id, "2", "lang", "en", "text", "this is some other text");
> index(id, "3", "lang", "en", "text", "more text");
> index(id, "4", "lang", "en", "text", "just some text");
> commit();
>   
> 
> QueryResponse rsp;
> rsp = query("indent", "true", "q", "*:*");
> assertFieldValues(rsp.getResults(), id, 1, 2, 3, 4); 
>   }
> }
> 
> Executing the above text either results in a: IOException occured when 
> talking to server at: https://127.0.0.1:44761//collection1
> Or it fails with a curous error: .response.maxScore:1.0!=null
> 
> The score correctly changes according to whatever value i set for parameter q.
> 
> I have checked other tests that extend BaseDistributedSearchTestCase but many 
> are just as plain simple as the test above. Any idea as to what i am missing?
> 
> Many thanks!
> Markus
> 


Re: Stemming words Using Solr

2015-09-04 Thread Ritesh Sinha
This is the code which i have written to get the stemmed word.

public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL solr = new URL(
"http://localhost:8983/solr/
"+args[0]+"/analysis/field?wt=json=true="+args[1]+"="+args[2]+"");
URLConnection sl = solr.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
sl.getInputStream()));
String inputLine;
StringBuilder sb = new StringBuilder();
while ((inputLine = in.readLine()) != null) {
sb.append(inputLine);
}

in.close();

JSONObject obj = new JSONObject(sb.toString());
JSONArray analysis = obj.getJSONObject("analysis")
.getJSONObject("field_types").getJSONObject(args[2])
.getJSONArray("query");

JSONArray jsonarray = new JSONArray(analysis.toString());
ArrayList stemmedWords = new ArrayList();
for (int i = 0; i < jsonarray.length(); i++) {
String objF = jsonarray.getString(i);
stemmedWords.add(objF);
}

String lastStemmedGroup = stemmedWords.get(stemmedWords.size() -
1).toString();

JSONArray finalStemmer = new JSONArray(lastStemmedGroup);

for (int i = 0; i < finalStemmer.length(); i++) {
JSONObject jsonobject = finalStemmer.getJSONObject(i);
String stemmedWord = jsonobject.getString("text");
System.out.println(stemmedWord);
}

}

}


Here, args[0] is core.
args[1] is the word i'll be sending for stemming.
args[2] is the Analyse Fieldname / FieldType.

I am looking into FieldAnalysisRequest.
Do you have some code snippet or something which can guide me ?

Thanks

On Fri, Sep 4, 2015 at 12:44 PM, Upayavira  wrote:

> Yes, look at the one I mentioned further up in this thread, which is a
> part of SolrJ: FieldAnalysisRequest
>
> That uses the same HTTP call in the backend, but formats the result in a
> Java friendly manner.
>
> Upayavira
>
> On Fri, Sep 4, 2015, at 05:52 AM, Ritesh Sinha wrote:
> > Yeah, I got. Thanks.
> >
> > It returns a json which have the stemmed words.I just need to parse it
> > and
> > get the value.
> >
> > But, isn't there any JAVA API available for it ?
> >
> > On Thu, Sep 3, 2015 at 7:58 PM, Upayavira  wrote:
> >
> > > yes, the URL should be something like:
> > >
> > >
> > >
> http://localhost:8983/solr/images/analysis/field?wt=json=true=
> > > =
> > >
> > > Upayavira
> > >
> > > On Thu, Sep 3, 2015, at 03:23 PM, Jack Krupansky wrote:
> > > > The # in the URL says to send the request to the admin UI, which of
> > > > course
> > > > returns an HTML web page. Instead, send the analysis URL fragment
> > > > directly
> > > > to the analysis API (not UI) for the Solr core, without the #.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Thu, Sep 3, 2015 at 8:45 AM, Ritesh Sinha <
> > > > kumarriteshranjansi...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I observed the inspect element and wrote a code to give back the
> > > content. I
> > > > > have included the url which was getting generated.
> > > > >
> > > > > public class URLConnectionReader {
> > > > > public static void main(String[] args) throws Exception {
> > > > > URL solr = new URL(
> > > > > "
> > > > >
> > > > >
> > >
> http://localhost:8983/solr/#/testcore/analysis?analysis.fieldvalue=holidays=inferlytics_output=1
> > > > > ");
> > > > > URLConnection sl = solr.openConnection();
> > > > > BufferedReader in = new BufferedReader(new
> InputStreamReader(
> > > > > sl.getInputStream()));
> > > > > String inputLine;
> > > > >
> > > > > while ((inputLine = in.readLine()) != null)
> > > > > System.out.println(inputLine);
> > > > > in.close();
> > > > > }
> > > > > }
> > > > >
> > > > >
> > > > > But it shows this in the consloe :
> > > > >
> > > > >  > > > > http://www.w3.org/TR/html4/strict.dtd;>
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > > >   Solr Admin
> > > > >
> > > > >href="img/favicon.ico?_=5.3.0">
> > > > >> > > > href="img/favicon.ico?_=5.3.0">
> > > > >
> > > > >> > > > href="css/styles/common.css?_=5.3.0">
> > > > >> > > > href="css/styles/analysis.css?_=5.3.0">
> > > > >> > > > href="css/styles/cloud.css?_=5.3.0">
> > > > >> > > > href="css/styles/cores.css?_=5.3.0">
> > > > >> > > > href="css/styles/dashboard.css?_=5.3.0">
> > > > >> > > > href="css/styles/dataimport.css?_=5.3.0">
> > > > >> > > > href="css/styles/files.css?_=5.3.0">
> > > > >> > > > href="css/styles/index.css?_=5.3.0">
> > > > >> > > > href="css/styles/java-properties.css?_=5.3.0">
> > > > >> > > > href="css/styles/logging.css?_=5.3.0">
> > > > >> > > > href="css/styles/menu.css?_=5.3.0">
> > > > >> > > > 

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread davidphilip cherian
Hi Kevin/Noble,

What is the download link to take the latest? What are the steps to compile
it, test and use?
We also have a use case to have this feature in solr too. Therefore, wanted
to test and above info would help a lot to get started.

Thanks.


On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee  wrote:

> Thanks, I downloaded the source and compiled it and replaced the jar file
> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be
> protecting the Collections API reload command now as long as I upload the
> security.json after startup of the Solr instances.  If I shutdown and bring
> the instances back up, the security is no longer in place and I have to
> upload the security.json again for it to take effect.
>
> - Kevin
>
> > On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
> >
> > Both these are committed. If you could test with the latest 5.3 branch
> > it would be helpful
> >
> > On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
> >> I opened a ticket for the same
> >> https://issues.apache.org/jira/browse/SOLR-8004
> >>
> >> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee 
> wrote:
> >>> I’ve found that completely exiting Chrome or Firefox and opening it
> back up re-prompts for credentials when they are required.  It was
> re-prompting with the /browse path where authentication was working each
> time I completely exited and started the browser again, however it won’t
> re-prompt unless you exit completely and close all running instances so I
> closed all instances each time to test.
> >>>
> >>> However, to make sure I ran it via the command line via curl as
> suggested and it still does not give any authentication error when trying
> to issue the command via curl.  I get a success response from all the Solr
> instances that the reload was successful.
> >>>
> >>> Not sure why the pre-canned permissions aren’t working, but the one to
> the request handler at the /browse path is.
> >>>
> >>>
>  On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
> 
>  " However, after uploading the new security.json and restarting the
>  web browser,"
> 
>  The browser remembers your login , So it is unlikely to prompt for the
>  credentials again.
> 
>  Why don't you try the RELOAD operation using command line (curl) ?
> 
>  On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee 
> wrote:
> > The restart issues aside, I’m trying to lockdown usage of the
> Collections API, but that also does not seem to be working either.
> >
> > Here is my security.json.  I’m using the “collection-admin-edit”
> permission and assigning it to the “adminRole”.  However, after uploading
> the new security.json and restarting the web browser, it doesn’t seem to be
> requiring credentials when calling the RELOAD action on the Collections
> API.  The only thing that seems to work is the custom permission “browse”
> which is requiring authentication before allowing me to pull up the page.
> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
> >
> > {
> >   "authentication":{
> >  "class":"solr.BasicAuthPlugin",
> >  "credentials": {
> >   "admin”:” ",
> >   "user": ” "
> >   }
> >   },
> >   "authorization":{
> >  "class":"solr.RuleBasedAuthorizationPlugin",
> >  "permissions": [
> >   {
> >   "name":"security-edit",
> >   "role":"adminRole"
> >   },
> >   {
> >   "name":"collection-admin-edit”,
> >   "role":"adminRole"
> >   },
> >   {
> >   "name":"browse",
> >   "collection": "inventory",
> >   "path": "/browse",
> >   "role":"browseRole"
> >   }
> >   ],
> >  "user-role": {
> >   "admin": [
> >   "adminRole",
> >   "browseRole"
> >   ],
> >   "user": [
> >   "browseRole"
> >   ]
> >   }
> >   }
> > }
> >
> > Also tried adding the permission using the Authorization API, but no
> effect, still isn’t protecting the Collections API from being invoked
> without a username password.  I do see in the Solr logs that it sees the
> updates because it outputs the messages “Updating /security.json …”,
> “Security node changed”, “Initializing 

Re: Merging documents from a distributed search

2015-09-04 Thread Joel Bernstein
It's possible that the ReducerStream's buffer can grow too large if
document groups are very large. But the ReducerStream only needs to hold
one group at a time in memory. The RollupStream, in trunk, has a grouping
implementation that doesn't hang on to all the Tuples from a group. You
could also implement a custom stream that does exactly what you need.

The AnalyicsQuery is much more efficient because the data is left in place.
The Streaming API has streaming overhead which is considerable. But it's
the Stream "shuffling" that gives you the power to do things like fully
distributed grouping.

How many records are processed in a typical query and what type of response
time do you need?

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Sep 3, 2015 at 3:25 PM, tedsolr  wrote:

> Thanks Joel, that link looks promising. The CloudSolrStream bypasses my
> issue
> of multiple shards. Perhaps the ReducerStream would provide what I need. At
> first glance I worry that the the buffer would grow too large - if its
> really holding the values for all the fields in each document
> (Tuple.getMaps()). I use a Map in my DelegatingCollector to store the
> "unique" docs, but I only keep the docId, my stats, and the ordinals for
> each field. Would you expect the new streams API to perform as well as my
> implementation of an AnalyticsQuery and a DelegatingCollector?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802p4227034.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Stemming words Using Solr

2015-09-04 Thread Upayavira
I don't have a code snippet - I just found it in the solrj source code.

As to using JSON, I'm not sure of the structure of the JSON you are
getting back, but you might find adding json.nl=map, which changes the
way it returns named lists, which may be easier to parse.

Upayavira

On Fri, Sep 4, 2015, at 10:14 AM, Ritesh Sinha wrote:
> This is the code which i have written to get the stemmed word.
> 
> public class URLConnectionReader {
> public static void main(String[] args) throws Exception {
> URL solr = new URL(
> "http://localhost:8983/solr/
> "+args[0]+"/analysis/field?wt=json=true="+args[1]+"="+args[2]+"");
> URLConnection sl = solr.openConnection();
> BufferedReader in = new BufferedReader(new InputStreamReader(
> sl.getInputStream()));
> String inputLine;
> StringBuilder sb = new StringBuilder();
> while ((inputLine = in.readLine()) != null) {
> sb.append(inputLine);
> }
> 
> in.close();
> 
> JSONObject obj = new JSONObject(sb.toString());
> JSONArray analysis = obj.getJSONObject("analysis")
> .getJSONObject("field_types").getJSONObject(args[2])
> .getJSONArray("query");
> 
> JSONArray jsonarray = new JSONArray(analysis.toString());
> ArrayList stemmedWords = new ArrayList();
> for (int i = 0; i < jsonarray.length(); i++) {
> String objF = jsonarray.getString(i);
> stemmedWords.add(objF);
> }
> 
> String lastStemmedGroup = stemmedWords.get(stemmedWords.size() -
> 1).toString();
> 
> JSONArray finalStemmer = new JSONArray(lastStemmedGroup);
> 
> for (int i = 0; i < finalStemmer.length(); i++) {
> JSONObject jsonobject = finalStemmer.getJSONObject(i);
> String stemmedWord = jsonobject.getString("text");
> System.out.println(stemmedWord);
> }
> 
> }
> 
> }
> 
> 
> Here, args[0] is core.
> args[1] is the word i'll be sending for stemming.
> args[2] is the Analyse Fieldname / FieldType.
> 
> I am looking into FieldAnalysisRequest.
> Do you have some code snippet or something which can guide me ?
> 
> Thanks
> 
> On Fri, Sep 4, 2015 at 12:44 PM, Upayavira  wrote:
> 
> > Yes, look at the one I mentioned further up in this thread, which is a
> > part of SolrJ: FieldAnalysisRequest
> >
> > That uses the same HTTP call in the backend, but formats the result in a
> > Java friendly manner.
> >
> > Upayavira
> >
> > On Fri, Sep 4, 2015, at 05:52 AM, Ritesh Sinha wrote:
> > > Yeah, I got. Thanks.
> > >
> > > It returns a json which have the stemmed words.I just need to parse it
> > > and
> > > get the value.
> > >
> > > But, isn't there any JAVA API available for it ?
> > >
> > > On Thu, Sep 3, 2015 at 7:58 PM, Upayavira  wrote:
> > >
> > > > yes, the URL should be something like:
> > > >
> > > >
> > > >
> > http://localhost:8983/solr/images/analysis/field?wt=json=true=
> > > > =
> > > >
> > > > Upayavira
> > > >
> > > > On Thu, Sep 3, 2015, at 03:23 PM, Jack Krupansky wrote:
> > > > > The # in the URL says to send the request to the admin UI, which of
> > > > > course
> > > > > returns an HTML web page. Instead, send the analysis URL fragment
> > > > > directly
> > > > > to the analysis API (not UI) for the Solr core, without the #.
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Thu, Sep 3, 2015 at 8:45 AM, Ritesh Sinha <
> > > > > kumarriteshranjansi...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I observed the inspect element and wrote a code to give back the
> > > > content. I
> > > > > > have included the url which was getting generated.
> > > > > >
> > > > > > public class URLConnectionReader {
> > > > > > public static void main(String[] args) throws Exception {
> > > > > > URL solr = new URL(
> > > > > > "
> > > > > >
> > > > > >
> > > >
> > http://localhost:8983/solr/#/testcore/analysis?analysis.fieldvalue=holidays=inferlytics_output=1
> > > > > > ");
> > > > > > URLConnection sl = solr.openConnection();
> > > > > > BufferedReader in = new BufferedReader(new
> > InputStreamReader(
> > > > > > sl.getInputStream()));
> > > > > > String inputLine;
> > > > > >
> > > > > > while ((inputLine = in.readLine()) != null)
> > > > > > System.out.println(inputLine);
> > > > > > in.close();
> > > > > > }
> > > > > > }
> > > > > >
> > > > > >
> > > > > > But it shows this in the consloe :
> > > > > >
> > > > > >  > > > > > http://www.w3.org/TR/html4/strict.dtd;>
> > > > > > 
> > > > > >
> > > > > > 
> > > > > >
> > > > > > 
> > > > > >
> > > > > >   Solr Admin
> > > > > >
> > > > > >> href="img/favicon.ico?_=5.3.0">
> > > > > >> > > > > href="img/favicon.ico?_=5.3.0">
> > > > > >
> > > > > >> > > > > href="css/styles/common.css?_=5.3.0">
> > > > > >  

Re: Queries on De-Duplication

2015-09-04 Thread Arcadius Ahouansou
You could try using a hash of the content?
On Sep 4, 2015 9:00 AM, "Zheng Lin Edwin Yeo"  wrote:

> Hi,
>
> I'm trying out on the De-Duplication.I've tried to create a new signature
> field in schema.xml
>  multiValued="false" />
>
> I've also added the following in solrconfig.xml.
>
> 
>  
> true
> signature
> false
> content
> solr.processor.Lookup3Signature
>  
> 
> 
> 
> 
>
>
> However, I can't do a copyField of content into this signature field as
> some of my contents are more than 32766 characters in length. Previously, I
> tried to point the signatureField directly to content. but that is not
> working too.
>
> Anything else that I can do to do a group on a new signatureField?
>
>
> Regards,
> Edwin
>


RE: http client mismatch

2015-09-04 Thread Firas Khasawneh
Hi Shawn,

I tried this

SystemDefaultHttpClient cl = new SystemDefaultHttpClient();
 HttpSolrClient solrSvr = new HttpSolrClient(url, cl);

And it worked. Thanks a lot for your help.

Regards,
Firas
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, September 04, 2015 1:21 AM
To: solr-user@lucene.apache.org
Subject: Re: http client mismatch

On 9/3/2015 11:04 PM, Shawn Heisey wrote:
> It sounds like your code is trying to assign the result of the 
> createClient method to an object of type SystemDefaultHttpClient.  
> This is a derivative type of CloseableHttpClient.  This would work if 
> the derivation were the other direction.
> 
> The compiler is saying that you cannot make this assignment.  I tried 
> to put these code lines into a SolrJ program:
> 
> SystemDefaultHttpClient sc;
> sc = HttpClientUtil.createClient(someParams);

I thought up an imperfect comparison to help understand why this is a problem.

Think about "SystemDefaultHttpClient" as Infiniti, a brand of car.
Think about "CloseableHttpClient" (the type returned by createClient) as 
Nissan.  The Infiniti brand is owned and built by Nissan.

Taking this idea further, imagine that "sc" is a parking spot that expects to 
hold a car made by Infiniti.  Trying to park a Nissan there isn't going to 
work, because the parking spot owner is very picky about exactly what gets to 
park there.

The reverse works just fine ... an Infiniti can take a spot made for a Nissan, 
and the parking spot owner will be very happy, because an Infiniti *is* a 
Nissan, only fancier.

Similarly, a SystemDefaultHttpClient is a CloseableHttpClient, but a 
CloseableHttpClient is not a SystemDefaultHttpClient.

Thanks,
Shawn



Solr DIH sub entity

2015-09-04 Thread vatuska
Hello. I work with Solr 4.10.
I use DIH and some custom java Transformers to synchronize my Solr index
with the database (MySQL is used)
Is there any way to change the fields in root entity from the sub entity?
I mean something like this

public class MySubEntityTransformer extends
org.apache.solr.handler.dataimport.Transformer {
@Override
public Object transformRow(Map row, Context context) {
Map parentRow = context.getParentContext().?
Object val = row.get("subEnityColumn");
if (val != null) {
//transform this value some way
Object generalValue = parentRow.get("documentField");
if (generalValue != null) {
  parentRow.put("documentField", generalValue + "_" + val);
} else {
   parentRow.put("documentField", val);
}
}
return row;
}
}

Or is there any way to apply some kind of the transformation for the root
entity after all its subentities has been processed?

I can't use the script transformers, because they works many times slower
than java extensions.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-DIH-sub-entity-tp4227167.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: which solrconfig.xml

2015-09-04 Thread Mark Fenbers

Chris,

The document "Uploading Structured Data Store Data with the Data Import 
Handler" has a number of references to solrconfig.xml, starting on Page 
2 and continuing on page 3 in the section "Configuring solrconfig.xml".  
It also is mentioned on Page 5 in the "Property Writer" and the "Data 
Sources" sections.  And other places in this document as well.


The solrconfig.xml file is also referenced (without a path) in the "Solr 
Quick Start" document, in the Design Overview section and other sections 
as well.  None of these references suggests the location of the 
solrconfig.xml file.  Doing a "find . -name solrconfig.xml" from the 
Solr home directory reveals about a dozen or so of these files in 
various subdirectories.  Thus, my confusion as to which one I need to 
customize...


I feel ready to graduate from the examples in "Solr Quick Start" 
document, e.g., using bin/solr -e dih and have fed in existing files on 
disk.  The tutorial was *excellent* for this part.  But now I want to 
build a "real" index using *my own* data from a database.  In doing 
this, I find the coaching in the tutorial to be rather absent.  For 
example, I haven't read in any of the documents I have found so far an 
explanation of why one might want to use more than one Solr node and 
more than one shard, or what the advantages are of using Solr in cloud 
mode vs stand-alone mode.  As a result, I had to 
improvise/guess/trial-and-error.  I did manage to configure my own data 
source and changed my queries to apply to my own data, but I did 
something wrong somewhere in solrconfig.xml because I get errors when 
running, now.  I solved some of them by copying the *.jar files from the 
./dist directory to the solr/lib directory (a tip I found when I googled 
the error message), but that only helped to a certain point.


I will post more specific questions about my issues when I have a chance 
to re-investigate that (hopefully later today).


I have *not* found specific Java code examples using Solr yet, but I 
haven't exhausted exploring the Solr website yet.  Hopefully, I'll find 
some examples using Solr in Java code...


Mark

On 9/2/2015 9:51 PM, Chris Hostetter wrote:

: various $HOME/solr-5.3.0 subdirectories.  The documents/tutorials say to edit
: the solrconfig.xml file for various configuration details, but they never say
: which one of these dozen to edit.  Moreover, I cannot determine which version

can you please give us a specific examples (ie: urls, page numbers &
version of the ref guide, etc...) of documentation that tell you to edit
the solrconfig.xml w/o being explicit about where to to find it so that we
can fix the docs?

FWIW: The official "Quick Start" tutorial does not mention editing
solrconfig.xml at all...

http://lucene.apache.org/solr/quickstart.html



-Hoss
http://www.lucidworks.com/





Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Markus Jelsma
Hello - i am trying to create some tests using BaseDistributedSearchTestCase 
but two errors usually appear. Consider the following test:

  @Test
  @ShardsFixed(num = 3)
  public void test() throws Exception {
del("*:*");

index(id, "1", "lang", "en", "text", "this is some text");
index(id, "2", "lang", "en", "text", "this is some other text");
index(id, "3", "lang", "en", "text", "more text");
index(id, "4", "lang", "en", "text", "just some text");
commit();

  
QueryResponse rsp;
rsp = query("indent", "true", "q", "*:*");
assertFieldValues(rsp.getResults(), id, 1, 2, 3, 4); 
  }
}

Executing the above text either results in a: IOException occured when talking 
to server at: https://127.0.0.1:44761//collection1
Or it fails with a curous error: .response.maxScore:1.0!=null

The score correctly changes according to whatever value i set for parameter q.

I have checked other tests that extend BaseDistributedSearchTestCase but many 
are just as plain simple as the test above. Any idea as to what i am missing?

Many thanks!
Markus


RE: http client mismatch

2015-09-04 Thread Firas Khasawneh
Hi Shawn,

Thanks for your response. I am not using http client directly. I am using 
SolrHttpClient which is using it so I have no control.
Below is a snippet of my test code:

HttpSolrClient solrSvr = new HttpSolrClient(url);
SolrQuery query=new SolrQuery();
query.setQuery("xyz");
query.setStart(0);
query.setRows(100);
QueryResponse res=solrSvr.query(query);
SolrDocumentList results=res.getResults();

Thanks,
Firas


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, September 04, 2015 1:21 AM
To: solr-user@lucene.apache.org
Subject: Re: http client mismatch

On 9/3/2015 11:04 PM, Shawn Heisey wrote:
> It sounds like your code is trying to assign the result of the 
> createClient method to an object of type SystemDefaultHttpClient.  
> This is a derivative type of CloseableHttpClient.  This would work if 
> the derivation were the other direction.
> 
> The compiler is saying that you cannot make this assignment.  I tried 
> to put these code lines into a SolrJ program:
> 
> SystemDefaultHttpClient sc;
> sc = HttpClientUtil.createClient(someParams);

I thought up an imperfect comparison to help understand why this is a problem.

Think about "SystemDefaultHttpClient" as Infiniti, a brand of car.
Think about "CloseableHttpClient" (the type returned by createClient) as 
Nissan.  The Infiniti brand is owned and built by Nissan.

Taking this idea further, imagine that "sc" is a parking spot that expects to 
hold a car made by Infiniti.  Trying to park a Nissan there isn't going to 
work, because the parking spot owner is very picky about exactly what gets to 
park there.

The reverse works just fine ... an Infiniti can take a spot made for a Nissan, 
and the parking spot owner will be very happy, because an Infiniti *is* a 
Nissan, only fancier.

Similarly, a SystemDefaultHttpClient is a CloseableHttpClient, but a 
CloseableHttpClient is not a SystemDefaultHttpClient.

Thanks,
Shawn



Re: Stemming words Using Solr

2015-09-04 Thread Ritesh Sinha
Adding json.nl=map instead of wt.json
 returns results in xml format.


   
  0
  2
   
   
  
 

   
   
   
   
   
   
   
   
   


   playing
   playing
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 1
 
1
 
  
   
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 1
 
1
1
 
  
   
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 false
 1
 
1
1
1
 
  
   
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 1
 
1
1
1
1
 
 false
  
   
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 1
 
1
1
1
1
1
 
 false
  
   
   
  
 playing
 [70 6c 61 79 69 6e 67]
 0
 7
 1
 ALPHANUM
 1
 
1
1
1
1
1
1
 
 false
  
   
   
  
 plai
 [70 6c 61 69]
 0
 7
 1
 ALPHANUM
 false
 1
 
1
1
1
1
1
1
1
 
  
   

 
  
  
   



On Fri, Sep 4, 2015 at 2:51 PM, Upayavira  wrote:

> I don't have a code snippet - I just found it in the solrj source code.
>
> As to using JSON, I'm not sure of the structure of the JSON you are
> getting back, but you might find adding json.nl=map, which changes the
> way it returns named lists, which may be easier to parse.
>
> Upayavira
>
> On Fri, Sep 4, 2015, at 10:14 AM, Ritesh Sinha wrote:
> > This is the code which i have written to get the stemmed word.
> >
> > public class URLConnectionReader {
> > public static void main(String[] args) throws Exception {
> > URL solr = new URL(
> > "http://localhost:8983/solr/
> >
> "+args[0]+"/analysis/field?wt=json=true="+args[1]+"="+args[2]+"");
> > URLConnection sl = solr.openConnection();
> > BufferedReader in = new BufferedReader(new InputStreamReader(
> > sl.getInputStream()));
> > String inputLine;
> > StringBuilder sb = new StringBuilder();
> > while ((inputLine = in.readLine()) != null) {
> > sb.append(inputLine);
> > }
> >
> > in.close();
> >
> > JSONObject obj = new JSONObject(sb.toString());
> > JSONArray analysis = obj.getJSONObject("analysis")
> > .getJSONObject("field_types").getJSONObject(args[2])
> > .getJSONArray("query");
> >
> > JSONArray jsonarray = new JSONArray(analysis.toString());
> > ArrayList stemmedWords = new ArrayList();
> > for (int i = 0; i < jsonarray.length(); i++) {
> > String 

Re: Indexing Fixed length file

2015-09-04 Thread timmsn
Hi Guys,


thanks for the Answers you help me alot. I wrote a php scipt for this
Problem.


Thank you




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807p4227163.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr DIH sub entity

2015-09-04 Thread Davis, Daniel (NIH/NLM) [C]
vatu...@yandex.ru wrote:
> Hello. I work with Solr 4.10.
> I use DIH and some custom java Transformers to synchronize my Solr index with 
> the database (MySQL is used) 
> Is there any way to change the fields in root entity from the sub entity?

I don't think that works, but if you are writing your own transformer, you 
should be able to associate the transformer with the parent and run your 
sub-entity query from within the transformer.   You may also want to look at 
using a ScriptTransformer to avoid writing Java code that needs to be compiled 
outside of the DIH.

Also, the main benefit of DIH is simplicity - it doesn't multi-thread well, it 
doesn't have a well-behaved cand it cannot choose any SolrCloud node when 
pushing data.   If you are already writing transformers in Java, you may want 
to look at separating your transformations from Java to take advantage of at 
least multi-threading, and even multiple machines (e.g. big data).   

I guess it all depends on how much DIH you have, and how much you've invested 
in it.

-Original Message-
From: vatuska [mailto:vatu...@yandex.ru] 
Sent: Friday, September 04, 2015 6:49 AM
To: solr-user@lucene.apache.org
Subject: Solr DIH sub entity

I mean something like this

public class MySubEntityTransformer extends 
org.apache.solr.handler.dataimport.Transformer {
@Override
public Object transformRow(Map row, Context context) {
Map parentRow = context.getParentContext().?
Object val = row.get("subEnityColumn");
if (val != null) {
//transform this value some way
Object generalValue = parentRow.get("documentField");
if (generalValue != null) {
  parentRow.put("documentField", generalValue + "_" + val);
} else {
   parentRow.put("documentField", val);
}
}
return row;
}
}

Or is there any way to apply some kind of the transformation for the root 
entity after all its subentities has been processed?

I can't use the script transformers, because they works many times slower than 
java extensions.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-DIH-sub-entity-tp4227167.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Noble Paul
There are no download links for 5.3.x branch  till we do a bug fix release

If you wish to download the trunk nightly (which is not same as 5.3.0)
check here 
https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/

If you wish to get the binaries for 5.3 branch you will have to make it
(you will need to install svn and ant)

Here are the steps

svn checkout 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
cd lucene_solr_5_3/solr
ant server



On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
 wrote:
> Hi Kevin/Noble,
>
> What is the download link to take the latest? What are the steps to compile
> it, test and use?
> We also have a use case to have this feature in solr too. Therefore, wanted
> to test and above info would help a lot to get started.
>
> Thanks.
>
>
> On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee  wrote:
>
>> Thanks, I downloaded the source and compiled it and replaced the jar file
>> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be
>> protecting the Collections API reload command now as long as I upload the
>> security.json after startup of the Solr instances.  If I shutdown and bring
>> the instances back up, the security is no longer in place and I have to
>> upload the security.json again for it to take effect.
>>
>> - Kevin
>>
>> > On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
>> >
>> > Both these are committed. If you could test with the latest 5.3 branch
>> > it would be helpful
>> >
>> > On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
>> >> I opened a ticket for the same
>> >> https://issues.apache.org/jira/browse/SOLR-8004
>> >>
>> >> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee 
>> wrote:
>> >>> I’ve found that completely exiting Chrome or Firefox and opening it
>> back up re-prompts for credentials when they are required.  It was
>> re-prompting with the /browse path where authentication was working each
>> time I completely exited and started the browser again, however it won’t
>> re-prompt unless you exit completely and close all running instances so I
>> closed all instances each time to test.
>> >>>
>> >>> However, to make sure I ran it via the command line via curl as
>> suggested and it still does not give any authentication error when trying
>> to issue the command via curl.  I get a success response from all the Solr
>> instances that the reload was successful.
>> >>>
>> >>> Not sure why the pre-canned permissions aren’t working, but the one to
>> the request handler at the /browse path is.
>> >>>
>> >>>
>>  On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>> 
>>  " However, after uploading the new security.json and restarting the
>>  web browser,"
>> 
>>  The browser remembers your login , So it is unlikely to prompt for the
>>  credentials again.
>> 
>>  Why don't you try the RELOAD operation using command line (curl) ?
>> 
>>  On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee 
>> wrote:
>> > The restart issues aside, I’m trying to lockdown usage of the
>> Collections API, but that also does not seem to be working either.
>> >
>> > Here is my security.json.  I’m using the “collection-admin-edit”
>> permission and assigning it to the “adminRole”.  However, after uploading
>> the new security.json and restarting the web browser, it doesn’t seem to be
>> requiring credentials when calling the RELOAD action on the Collections
>> API.  The only thing that seems to work is the custom permission “browse”
>> which is requiring authentication before allowing me to pull up the page.
>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
>> >
>> > {
>> >   "authentication":{
>> >  "class":"solr.BasicAuthPlugin",
>> >  "credentials": {
>> >   "admin”:” ",
>> >   "user": ” "
>> >   }
>> >   },
>> >   "authorization":{
>> >  "class":"solr.RuleBasedAuthorizationPlugin",
>> >  "permissions": [
>> >   {
>> >   "name":"security-edit",
>> >   "role":"adminRole"
>> >   },
>> >   {
>> >   "name":"collection-admin-edit”,
>> >   "role":"adminRole"
>> >   },
>> >   {
>> >   "name":"browse",
>> >   "collection": "inventory",
>> >   "path": "/browse",
>> >   "role":"browseRole"
>> >   }
>> >   ],
>> >  

Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yes please.:
http://www.amazon.com/Solr-Troubleshooting-Maintenance-Alexandre-Rafalovitch/dp/1491920149/

:-)

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 September 2015 at 10:30, Yonik Seeley  wrote:
> On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch
>  wrote:
>> Yonik,
>>
>> Is this all visible on query debug level?
>
> Nope, unfortunately not.
>
> This is part of a bigger issue we should work at doing better at for
> Solr 6: debugability / supportability.
> For a specific request, what took up the memory, what cache misses or
> cache instantiations were there, how much request-specific memory was
> allocated, how much shared memory was needed to satisfy the request,
> etc.
>
> -Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
>> This is part of a bigger issue we should work at doing better at for
>> Solr 6: debugability / supportability.
>> For a specific request, what took up the memory, what cache misses or
>> cache instantiations were there, how much request-specific memory was
>> allocated, how much shared memory was needed to satisfy the request,
>> etc.

Oh, and if we have the ability to *tell* when a request is going to
allocate a big chunk of memory,
then we should also be able to either prevent it from happening or
terminate the request shortly after.

So one could say, only allow this request to:
- cause 500MB more of shared memory to be allocated (like field cache)
- only allow it to use 5GB of shared memory total (so successive
queries don't keep upping the total amount allocated)
- only allow 100MB of request-specific memory to be allocated

-Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch
 wrote:
> Yonik,
>
> Is this all visible on query debug level?

Nope, unfortunately not.

This is part of a bigger issue we should work at doing better at for
Solr 6: debugability / supportability.
For a specific request, what took up the memory, what cache misses or
cache instantiations were there, how much request-specific memory was
allocated, how much shared memory was needed to satisfy the request,
etc.

-Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yonik,

Is this all visible on query debug level? Would it be effective to ask
to run both queries with debug enabled and to share the expanded query
value? Would that show up the differences between Lucene
implementations you described?

(Looking for troubleshooting tips to reuse).

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 September 2015 at 10:06, Yonik Seeley  wrote:
> On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes  wrote:
>>
>> I have a query like:
>>
>> q==enabled:true
>>
>> For purposes of this conversation, "fq=enabled:true" is set for every query, 
>> I never open a new searcher, and this is the only fq I ever use, so the 
>> filter cache size is 1, and the hit ratio is 1.
>> The fq=enabled:true clause matches about 15% of my documents. I have some 
>> 20M documents per shard, in a 5.3 solrcloud cluster.
>>
>> Under these circumstances, this alternate version of the query averages 
>> about 1/3 faster, consumes less CPU, and generates less garbage:
>>
>> q= +enabled:true
>>
>> So it appears I have a case where using the cached fq result is more 
>> expensive than just putting the same restriction in the query.
>> Does someone have a clear mental model of how “q” and “fq” interact?
>
> Lucene seems to always be changing it's execution model, so it can be
> difficult to keep up.  What version of Solr are you using?
> Lucene also changed how filters work,  so now, a filter is
> incorporated with the query like so:
>
> query = new BooleanQuery.Builder()
> .add(query, Occur.MUST)
> .add(pf.filter, Occur.FILTER)
> .build();
>
> It may be that term queries are no longer worth caching... if this is
> the case, we could automatically not cache them.
>
> It also may be the structure of the query that is making the
> difference.  Solr is creating
>
> (complicated stuff) +(filter(enabled:true))
>
> If you added +enabled:true directly to an existing boolean query, that
> may be more efficient for lucene to process (flatter structure).
>
> If you haven't already, could you try putting parens around your
> (complicated stuff) to see if it makes any difference?
>
> -Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes  wrote:
>
> I have a query like:
>
> q==enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, 
> I never open a new searcher, and this is the only fq I ever use, so the 
> filter cache size is 1, and the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
> documents per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 
> 1/3 faster, consumes less CPU, and generates less garbage:
>
> q= +enabled:true
>
> So it appears I have a case where using the cached fq result is more 
> expensive than just putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?

Lucene seems to always be changing it's execution model, so it can be
difficult to keep up.  What version of Solr are you using?
Lucene also changed how filters work,  so now, a filter is
incorporated with the query like so:

query = new BooleanQuery.Builder()
.add(query, Occur.MUST)
.add(pf.filter, Occur.FILTER)
.build();

It may be that term queries are no longer worth caching... if this is
the case, we could automatically not cache them.

It also may be the structure of the query that is making the
difference.  Solr is creating

(complicated stuff) +(filter(enabled:true))

If you added +enabled:true directly to an existing boolean query, that
may be more efficient for lucene to process (flatter structure).

If you haven't already, could you try putting parens around your
(complicated stuff) to see if it makes any difference?

-Yonik


how to make suggestion in solr

2015-09-04 Thread Mugeesh Husain
Hi,

I have a requirement that in the field user want suggestion like i enter
nice comes nicer,nicest.

but the problem is that i have a 40 millions of data and same number of solr
field.

how it possible to make suggestion all number of field ?

Thanks
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-suggestion-in-solr-tp4227193.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Kevin Lee
Noble,

Does SOLR-8000 need to be re-opened?  Has anyone else been able to test the 
restart fix?  

At startup, these are the log messages that say there is no security 
configuration and the plugins aren’t being used even though security.json is in 
Zookeeper:
2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer Security conf 
doesn't exist. Skipping setup for authorization module.
2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer No 
authentication plugin used.

Thanks,
Kevin

> On Sep 4, 2015, at 5:47 AM, Noble Paul  wrote:
> 
> There are no download links for 5.3.x branch  till we do a bug fix release
> 
> If you wish to download the trunk nightly (which is not same as 5.3.0)
> check here 
> https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/
> 
> If you wish to get the binaries for 5.3 branch you will have to make it
> (you will need to install svn and ant)
> 
> Here are the steps
> 
> svn checkout 
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
> cd lucene_solr_5_3/solr
> ant server
> 
> 
> 
> On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
>  wrote:
>> Hi Kevin/Noble,
>> 
>> What is the download link to take the latest? What are the steps to compile
>> it, test and use?
>> We also have a use case to have this feature in solr too. Therefore, wanted
>> to test and above info would help a lot to get started.
>> 
>> Thanks.
>> 
>> 
>> On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee  wrote:
>> 
>>> Thanks, I downloaded the source and compiled it and replaced the jar file
>>> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be
>>> protecting the Collections API reload command now as long as I upload the
>>> security.json after startup of the Solr instances.  If I shutdown and bring
>>> the instances back up, the security is no longer in place and I have to
>>> upload the security.json again for it to take effect.
>>> 
>>> - Kevin
>>> 
 On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
 
 Both these are committed. If you could test with the latest 5.3 branch
 it would be helpful
 
 On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
> I opened a ticket for the same
> https://issues.apache.org/jira/browse/SOLR-8004
> 
> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee 
>>> wrote:
>> I’ve found that completely exiting Chrome or Firefox and opening it
>>> back up re-prompts for credentials when they are required.  It was
>>> re-prompting with the /browse path where authentication was working each
>>> time I completely exited and started the browser again, however it won’t
>>> re-prompt unless you exit completely and close all running instances so I
>>> closed all instances each time to test.
>> 
>> However, to make sure I ran it via the command line via curl as
>>> suggested and it still does not give any authentication error when trying
>>> to issue the command via curl.  I get a success response from all the Solr
>>> instances that the reload was successful.
>> 
>> Not sure why the pre-canned permissions aren’t working, but the one to
>>> the request handler at the /browse path is.
>> 
>> 
>>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>>> 
>>> " However, after uploading the new security.json and restarting the
>>> web browser,"
>>> 
>>> The browser remembers your login , So it is unlikely to prompt for the
>>> credentials again.
>>> 
>>> Why don't you try the RELOAD operation using command line (curl) ?
>>> 
>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee 
>>> wrote:
 The restart issues aside, I’m trying to lockdown usage of the
>>> Collections API, but that also does not seem to be working either.
 
 Here is my security.json.  I’m using the “collection-admin-edit”
>>> permission and assigning it to the “adminRole”.  However, after uploading
>>> the new security.json and restarting the web browser, it doesn’t seem to be
>>> requiring credentials when calling the RELOAD action on the Collections
>>> API.  The only thing that seems to work is the custom permission “browse”
>>> which is requiring authentication before allowing me to pull up the page.
>>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
 
 {
  "authentication":{
 "class":"solr.BasicAuthPlugin",
 "credentials": {
  "admin”:” ",
  "user": ” "
  }
  },
  "authorization":{
 "class":"solr.RuleBasedAuthorizationPlugin",
 "permissions": [
  {
 

Re: /suggest

2015-09-04 Thread Mugeesh Husain
Hi, 

I have a requirement that in the field user want suggestion like i enter
nice comes nicer,nicest. 

but the problem is that i have a 40 millions of data and same number of solr
field. 

how it possible to make suggestion all number of field ? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggest-tp4124963p4227196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Queries on De-Duplication

2015-09-04 Thread Zheng Lin Edwin Yeo
How do we do a hashing of the content?

Regards,
Edwin

On 4 September 2015 at 17:37, Arcadius Ahouansou 
wrote:

> You could try using a hash of the content?
> On Sep 4, 2015 9:00 AM, "Zheng Lin Edwin Yeo" 
> wrote:
>
> > Hi,
> >
> > I'm trying out on the De-Duplication.I've tried to create a new signature
> > field in schema.xml
> >  > multiValued="false" />
> >
> > I've also added the following in solrconfig.xml.
> >
> > 
> >  
> > true
> > signature
> > false
> > content
> > solr.processor.Lookup3Signature
> >  
> > 
> > 
> > 
> > 
> >
> >
> > However, I can't do a copyField of content into this signature field as
> > some of my contents are more than 32766 characters in length.
> Previously, I
> > tried to point the signatureField directly to content. but that is not
> > working too.
> >
> > Anything else that I can do to do a group on a new signatureField?
> >
> >
> > Regards,
> > Edwin
> >
>


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-04 Thread Renee Sun
Shawn, thanks so much, and this user forum is so helpful!

I will start use autocommit with confidence it will greatly help reducing
the false commit requests (a lot) from processes in our system.

Regarding the solr version, it is actually a big problem we have to resolve
sooner or later.

When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large
data, we used :

LUCENE_29

which seems to work fine except a lot of such warnings in catalina.out:

WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You
should at some point declare and reindex to at least 3.0, because 2.x
emulation is deprecated and will be removed in 4.0

We have a built a infrastructure which scales well using solr, is it a good
practice to upgrade to solr 4.x without using solrCloud if it is possible at
all?

thanks!
Renee 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR last modified different than filesystem last modified

2015-09-04 Thread sat
Thank you very much for your reply.

What do you mean by updating the viewer ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-last-modified-different-than-filesystem-last-modified-tp4226894p4227164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: which solrconfig.xml

2015-09-04 Thread Erick Erickson
Mark:

Right, the problem with Google searches (as you well know) is that you
get random snippets from all over the place, ones that often assume
some background knowledge.

There are several good books around that tend to have things arranged
progressively that might be a good investment, here's a good one:

http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021

and searching "apache solr" on Amazon turns up a bunch of others.

Best,
Erick



On Fri, Sep 4, 2015 at 4:43 AM, Mark Fenbers  wrote:
> Chris,
>
> The document "Uploading Structured Data Store Data with the Data Import
> Handler" has a number of references to solrconfig.xml, starting on Page 2
> and continuing on page 3 in the section "Configuring solrconfig.xml".  It
> also is mentioned on Page 5 in the "Property Writer" and the "Data Sources"
> sections.  And other places in this document as well.
>
> The solrconfig.xml file is also referenced (without a path) in the "Solr
> Quick Start" document, in the Design Overview section and other sections as
> well.  None of these references suggests the location of the solrconfig.xml
> file.  Doing a "find . -name solrconfig.xml" from the Solr home directory
> reveals about a dozen or so of these files in various subdirectories.  Thus,
> my confusion as to which one I need to customize...
>
> I feel ready to graduate from the examples in "Solr Quick Start" document,
> e.g., using bin/solr -e dih and have fed in existing files on disk.  The
> tutorial was *excellent* for this part.  But now I want to build a "real"
> index using *my own* data from a database.  In doing this, I find the
> coaching in the tutorial to be rather absent.  For example, I haven't read
> in any of the documents I have found so far an explanation of why one might
> want to use more than one Solr node and more than one shard, or what the
> advantages are of using Solr in cloud mode vs stand-alone mode.  As a
> result, I had to improvise/guess/trial-and-error.  I did manage to configure
> my own data source and changed my queries to apply to my own data, but I did
> something wrong somewhere in solrconfig.xml because I get errors when
> running, now.  I solved some of them by copying the *.jar files from the
> ./dist directory to the solr/lib directory (a tip I found when I googled the
> error message), but that only helped to a certain point.
>
> I will post more specific questions about my issues when I have a chance to
> re-investigate that (hopefully later today).
>
> I have *not* found specific Java code examples using Solr yet, but I haven't
> exhausted exploring the Solr website yet.  Hopefully, I'll find some
> examples using Solr in Java code...
>
> Mark
>
> On 9/2/2015 9:51 PM, Chris Hostetter wrote:
>>
>> : various $HOME/solr-5.3.0 subdirectories.  The documents/tutorials say to
>> edit
>> : the solrconfig.xml file for various configuration details, but they
>> never say
>> : which one of these dozen to edit.  Moreover, I cannot determine which
>> version
>>
>> can you please give us a specific examples (ie: urls, page numbers &
>> version of the ref guide, etc...) of documentation that tell you to edit
>> the solrconfig.xml w/o being explicit about where to to find it so that we
>> can fix the docs?
>>
>> FWIW: The official "Quick Start" tutorial does not mention editing
>> solrconfig.xml at all...
>>
>> http://lucene.apache.org/solr/quickstart.html
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>


Re: Order of hosts in zkHost

2015-09-04 Thread Tomás Fernández Löbbe
I believe Arcadius has a point, but I still think the answer is no.
ZooKeeper clients (Solr/SolrJ)  connect to a single ZooKeeper server
instance at a time, and keep that session open to that same server as long
as they can/need. During this time, all interactions between the client and
the ZK ensemble will be done to the same ZK server instance (yes, some
operations will require that server to talk with the leader, but not all,
reads are served locally for example). They will only switch to a different
ZooKeeper server instance if the connection is lost for some reason. If all
the clients are connected to the same ZK server, the load wouldn't be
evenly distributed.

However, according to ZooKeeper documentation [1] (and I haven't tested
this), ZK clients don't chose the servers from the connection string in
order:
"To create a client session the application code must provide a connection
string containing a comma separated list of host:port pairs, each
corresponding to a ZooKeeper server (e.g. "127.0.0.1:4545" or "
127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"). The ZooKeeper client
library will pick an arbitrary server and try to connect to it."


Tomás

[1] http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html


On Fri, Sep 4, 2015 at 9:12 AM, Erick Erickson 
wrote:

> Arcadius:
>
> Note that one of the more recent changes is "per collection states" in
> ZK. So rather
> than have one huge clusterstate.json that gets passed out to to all
> collection on any
> change, the listeners can now listen only to specific collections.
>
> Reduces the "thundering herd" problem.
>
> Best,
> Erick
>
> On Fri, Sep 4, 2015 at 12:39 AM, Arcadius Ahouansou
>  wrote:
> > Hello Shawn.
> > This question was raised because IMHO, apart from leader election, there
> > are other load-generating activities such as all 10 solrj
> > clients+solrCloudNodes listening to changes on
> clusterstate.json/state.json
> > and downloading the whole file in case there is a change... And this
> would
> > have  happened on zk1 only if we did not shuffle... That's the theory.
> > I could test this and see.
> > On Sep 4, 2015 6:27 AM, "Shawn Heisey"  wrote:
> >
> >> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
> >> > Let's say we have 10 SolrJ clients all configured with
> >> > zkhost=zk1:port,zk2:port,zk3:port
> >> >
> >> > For each of the 10 SolrJ clients, would it make a difference in term
> of
> >> > load on zk1 (the server on the list) if we shuffle around the order of
> >> the
> >> > ZK servers in zkHost or is it all the same?
> >> >
> >> > I would have thought that shuffling would lower load on zk1.
> >>
> >> I don't think this is going to make much difference.  Here's why,
> >> assuming that my understanding of how it all works is correct:
> >>
> >> One of the things zookeeper does is manage elections.  It helps figure
> >> out which member of a cluster is the leader.  I think Zookeeper uses
> >> this concept internally, too.  One of the hosts in the ensemble will be
> >> elected to be the leader, which accepts all input and replicates it to
> >> the other members of the cluster.  All of the clients will be talking to
> >> the leader first, no matter what order the hosts are listed.
> >>
> >> If my understanding of how this works is flawed, then what I just said
> >> is probably wrong.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Order of hosts in zkHost

2015-09-04 Thread Erick Erickson
Arcadius:

Note that one of the more recent changes is "per collection states" in
ZK. So rather
than have one huge clusterstate.json that gets passed out to to all
collection on any
change, the listeners can now listen only to specific collections.

Reduces the "thundering herd" problem.

Best,
Erick

On Fri, Sep 4, 2015 at 12:39 AM, Arcadius Ahouansou
 wrote:
> Hello Shawn.
> This question was raised because IMHO, apart from leader election, there
> are other load-generating activities such as all 10 solrj
> clients+solrCloudNodes listening to changes on clusterstate.json/state.json
> and downloading the whole file in case there is a change... And this would
> have  happened on zk1 only if we did not shuffle... That's the theory.
> I could test this and see.
> On Sep 4, 2015 6:27 AM, "Shawn Heisey"  wrote:
>
>> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
>> > Let's say we have 10 SolrJ clients all configured with
>> > zkhost=zk1:port,zk2:port,zk3:port
>> >
>> > For each of the 10 SolrJ clients, would it make a difference in term of
>> > load on zk1 (the server on the list) if we shuffle around the order of
>> the
>> > ZK servers in zkHost or is it all the same?
>> >
>> > I would have thought that shuffling would lower load on zk1.
>>
>> I don't think this is going to make much difference.  Here's why,
>> assuming that my understanding of how it all works is correct:
>>
>> One of the things zookeeper does is manage elections.  It helps figure
>> out which member of a cluster is the leader.  I think Zookeeper uses
>> this concept internally, too.  One of the hosts in the ensemble will be
>> elected to be the leader, which accepts all input and replicates it to
>> the other members of the cluster.  All of the clients will be talking to
>> the leader first, no matter what order the hosts are listed.
>>
>> If my understanding of how this works is flawed, then what I just said
>> is probably wrong.
>>
>> Thanks,
>> Shawn
>>
>>


Config error mystery

2015-09-04 Thread Mark Fenbers

Greetings,

I'm moving on from the tutorials and trying to setup an index for my own 
data (from a database).  All I did was add the following to the 
solrconfig.xml (taken verbatim from the example in Solr documentation, 
except for the name="config" pathname) and I get an error in the 
web-based UI.


  class="org.apache.solr.handler.dataimport.DataImportHandler" >


/localapps/dev/EventLog/data-config.xml

  

Because of this error, no /dataimport page is available in the Admin 
user interface; therefore, I cannot visit the page 
http://localhost:8983/solr/dataimport.  The actual error is:


org.apache.solr.common.SolrException: Error Instantiating 
requestHandler, org.apache.solr.handler.dataimport.DataImportHandler 
failed to instantiate org.apache.solr.request.SolrRequestHandler

at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:659)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
requestHandler, org.apache.solr.handler.dataimport.DataImportHandler 
failed to instantiate org.apache.solr.request.SolrRequestHandler

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:588)
at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:122)
at org.apache.solr.core.PluginBag.init(PluginBag.java:217)
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:130)

at org.apache.solr.core.SolrCore.(SolrCore.java:773)
... 9 more
Caused by: java.lang.ClassCastException: class 
org.apache.solr.handler.dataimport.DataImportHandler

at java.lang.Class.asSubclass(Class.java:3208)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567)
... 13 more


If I remove the  section and restart Solr, the error 
goes away.  As best I can tell, the contents of

/localapps/dev/EventLog/data-config.xml look fine, too.  See it here:


url="jdbc:postgresql://dx1f/OHRFC" user="awips" />


deltaQuery="SELECT posttime FROM eventlogtext WHERE 
lastmodtime > '${dataimporter.last_index_time}'">







It seems to me that this problem could be a classpath issue, but I 
copied the appropriate jar file into the solr/lib directory to be sure.  
This made the (slightly different) initial error go away, but now I 
cannot make this one go away.


Any ideas?

Mark





Re: Cached fq decreases performance

2015-09-04 Thread Jeff Wartes


On 9/4/15, 7:06 AM, "Yonik Seeley"  wrote:
>
>Lucene seems to always be changing it's execution model, so it can be
>difficult to keep up.  What version of Solr are you using?
>Lucene also changed how filters work,  so now, a filter is
>incorporated with the query like so:
>
>query = new BooleanQuery.Builder()
>.add(query, Occur.MUST)
>.add(pf.filter, Occur.FILTER)
>.build();
>
>It may be that term queries are no longer worth caching... if this is
>the case, we could automatically not cache them.
>
>It also may be the structure of the query that is making the
>difference.  Solr is creating
>
>(complicated stuff) +(filter(enabled:true))
>
>If you added +enabled:true directly to an existing boolean query, that
>may be more efficient for lucene to process (flatter structure).
>
>If you haven't already, could you try putting parens around your
>(complicated stuff) to see if it makes any difference?
>
>-Yonik


I’ll reply at this point in the thread, since it’s addressed to me, but I
strongly agree with some of the later comments in the thread about knowing
what’s going on. The whole point of this post is that this situation
violated my mental heuristics about how to craft a query.

In answer to the question, this is a Solrcloud 5.3 cluster. I can provide
a little more detail on (complicated stuff) too if that’s helpful. I have
not tried putting everything else in parens, but it’s a couple of distinct
paren clauses anyway:

q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several
fields)}") +(_query_:"{!dismax (several fields)}") +enabled:true

So to be clear, that query template outperforms this one:
q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several
fields)}") +(_query_:"{!dismax (several fields)}")=enabled:true


Your comments remind me that I migrated from 5.2.1 to 5.3 while I’ve been
doing my performance testing, and I thought I noticed a performance
degradation in that transition, but I never followed though to confirm
that. I hadn’t tested moving that FQ clause into the Q on 5.2.1, only 5.3.






Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-04 Thread Shawn Heisey
On 9/4/2015 10:14 AM, Renee Sun wrote:
> I will start use autocommit with confidence it will greatly help reducing
> the false commit requests (a lot) from processes in our system.
>
> Regarding the solr version, it is actually a big problem we have to resolve
> sooner or later.
>
> When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large
> data, we used :
>
> LUCENE_29
>
> which seems to work fine except a lot of such warnings in catalina.out:
>
> WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You
> should at some point declare and reindex to at least 3.0, because 2.x
> emulation is deprecated and will be removed in 4.0

Setting luceneMatchVersion to an older version, contrary to most
people's expectations, does NOT change the index format.  It basically
turns on a compatibility mode for analysis components like
StopFilterFactory so that the created terms work like the older version,
if the code has a check for that older version that produces different
behavior.  Basically you use LMV to disable analysis bugfixes that don't
work for you.  In your case, any index segments created since your
upgrade are Lucene 3.5 format, not Lucene 2.9.

> We have a built a infrastructure which scales well using solr, is it a good
> practice to upgrade to solr 4.x without using solrCloud if it is possible at
> all?

Almost all of my Solr servers (running 4.x, we have not yet upgraded to
5.x) are NOT running in cloud mode.  Although it would make some aspects
of maintaining my index easier, I would lose some of the functionality
if I upgraded to a fully replicated SolrCloud setup.

Thanks,
Shawn



Re: Config error mystery

2015-09-04 Thread Kevin Lee
Are you using a single instance or cloud?  What version of Solr are you using?  
In your solrconfig.xml is the path to where you copied your library specified 
in a  tag?  Do you have a jar file for the Postgres JDBC driver in your 
lib directory as well?

For simple setup in 5.x I copy the jars to a lib directory under the 
core/collection directory.  For example, under server/solr//lib I 
would have the solr-dataimporthandler-.jar and the jdbc driver jar 
file.  These should be automatically picked up without having to add anything 
to the solrconfig.xml in terms of  tags.

For a production cloud deployment, I create a lib directory outside of the 
core/collection directory somewhere else on the file system so that it is easy 
to install without having to wait for a directory to be created by the 
Collections CREATE command and add the appropriate entry to the solrconfig.xml. 
 Then stick both jars in that directory.

Your error may be different, but as long as I have both jars in one of the two 
places mentioned above with the appropriate entry in solrconfig.xml if needed, 
then it has been working in my setups.

- Kevin


> On Sep 4, 2015, at 9:40 AM, Mark Fenbers  wrote:
> 
> Greetings,
> 
> I'm moving on from the tutorials and trying to setup an index for my own data 
> (from a database).  All I did was add the following to the solrconfig.xml 
> (taken verbatim from the example in Solr documentation, except for the 
> name="config" pathname) and I get an error in the web-based UI.
> 
>   class="org.apache.solr.handler.dataimport.DataImportHandler" >
>
>/localapps/dev/EventLog/data-config.xml
>
>  
> 
> Because of this error, no /dataimport page is available in the Admin user 
> interface; therefore, I cannot visit the page 
> http://localhost:8983/solr/dataimport.  The actual error is:
> 
> org.apache.solr.common.SolrException: Error Instantiating requestHandler, 
> org.apache.solr.handler.dataimport.DataImportHandler failed to instantiate 
> org.apache.solr.request.SolrRequestHandler
>at org.apache.solr.core.SolrCore.(SolrCore.java:820)
>at org.apache.solr.core.SolrCore.(SolrCore.java:659)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
>at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Error Instantiating 
> requestHandler, org.apache.solr.handler.dataimport.DataImportHandler failed 
> to instantiate org.apache.solr.request.SolrRequestHandler
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:588)
>at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:122)
>at org.apache.solr.core.PluginBag.init(PluginBag.java:217)
>at 
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:130)
>at org.apache.solr.core.SolrCore.(SolrCore.java:773)
>... 9 more
> Caused by: java.lang.ClassCastException: class 
> org.apache.solr.handler.dataimport.DataImportHandler
>at java.lang.Class.asSubclass(Class.java:3208)
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475)
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567)
>... 13 more
> 
> 
> If I remove the  section and restart Solr, the error goes 
> away.  As best I can tell, the contents of
> /localapps/dev/EventLog/data-config.xml look fine, too.  See it here:
> 
> 
> url="jdbc:postgresql://dx1f/OHRFC" user="awips" />
>
>deltaQuery="SELECT posttime FROM eventlogtext WHERE 
> lastmodtime > '${dataimporter.last_index_time}'">
>
>
>
>
> 
> 
> It seems to me that this problem could be a classpath issue, but I copied the 
> appropriate jar file into the solr/lib directory to be sure.  This made the 
> (slightly different) initial error go away, but now I cannot make this one go 
> away.
> 
> Any ideas?
> 
> Mark
> 
> 
> 



Re: Merging documents from a distributed search

2015-09-04 Thread tedsolr
Upayavira ,

The docs are all unique. In my example the two docs are considered to be
dupes because the requested fields all have the same values.
fields   AB   C   D E
Doc 1: apple, 10, 15, bye, yellow
Doc 2: apple, 12, 15, by, green

The two docs are certainly unique. Say they are on different shards in the
same collection. If the search request has fl:A,C then the two are dupes and
the user wants to see them collapsed. If the search request has fl:A,B,C
then the two are unique from the user's perspective and display separately.

Each doc typically has a couple hundred fields. When viewed through the lens
of just 3 or 4 fields, lots of docs, sometimes 1000s will be rolled up and
I'll compute some stats on that group. Bringing all those docs back to the
calling app for processing is too slow. The AnalyticsQuery does a great job
of filtering out the dupes, but it looks like I need another solution for
multi shard collections.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802p4227261.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Config error mystery

2015-09-04 Thread Shawn Heisey
On 9/4/2015 10:40 AM, Mark Fenbers wrote:
> Caused by: java.lang.ClassCastException: class
> org.apache.solr.handler.dataimport.DataImportHandler
> at java.lang.Class.asSubclass(Class.java:3208) 

This is the root cause of your inability to create the dataimport
handler.  This is a different message than you would get if you had not
included the dataimport jar from the contrib directory, but it is related.

Chances are good that you have multiple versions of Solr jars on your
classpath and/or specified with  directives in solrconfig.xml.  It
may be a version mismatch between the dataimport jar and the other Solr
jars in the webapp and/or on the classpath.

Thanks,
Shawn



frequently update field

2015-09-04 Thread sara hajili
hi
i am new in solr, i face to a problem and need any solution to solve that.
i have a field that this field need to update frequently.
"image i need to index all post of member of a social app"
in this case i need to store and index all posts field like caption ,
image, title,comments ,etc
but question is about some field like
"like_count,repost_count,comment_count" this field frequenly changed and i
need to update that but other like caption ,title are not as the same of
like count field.
so what is the best solution to handle this frequntly update..
i found that in solr 4 people used external file.
but now in solr 5.x i see that atomic update appear.
atomic update is substitute of extenal file?and what is best approach in
this case?
"i really worry about cost of re indexing docs when update like count"


Re: which solrconfig.xml

2015-09-04 Thread Mikhail Khludnev
Mark,
Thanks for your feedback. Making Solr handy is important for us.

On Fri, Sep 4, 2015 at 1:43 PM, Mark Fenbers  wrote:

> Chris,
>
> The document "Uploading Structured Data Store Data with the Data Import
> Handler" has a number of references to solrconfig.xml, starting on Page 2
> and continuing on page 3 in the section "Configuring solrconfig.xml".  It
> also is mentioned on Page 5 in the "Property Writer" and the "Data Sources"
> sections.  And other places in this document as well.
>
> The solrconfig.xml file is also referenced (without a path) in the "Solr
> Quick Start" document, in the Design Overview section and other sections as
> well.  None of these references suggests the location of the solrconfig.xml
> file.  Doing a "find . -name solrconfig.xml" from the Solr home directory
> reveals about a dozen or so of these files in various subdirectories.
> Thus, my confusion as to which one I need to customize...
>

Here I can only suggest to get into SolrAdmin, pick a particular core, and
find Instance directory on Overview tab. Here is the directory, which you
can run find for solrconfig.xml on.
I just wonder what exactly we can contribute into the guide?

We have
https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml
The solrconfig.xml file is located in the conf/ directory for each
collection.


>
> I feel ready to graduate from the examples in "Solr Quick Start" document,
> e.g., using bin/solr -e dih and have fed in existing files on disk.  The
> tutorial was *excellent* for this part.  But now I want to build a "real"
> index using *my own* data from a database.  In doing this, I find the
> coaching in the tutorial to be rather absent.  For example, I haven't read
> in any of the documents I have found so far an explanation of why one might
> want to use more than one Solr node and more than one shard, or what the
> advantages are of using Solr in cloud mode vs stand-alone mode.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

..., these capabilities provide distributed indexing and search
capabilities, supporting the following features:

   - ...
   - *Automatic load balancing and fail-over for queries*



> As a result, I had to improvise/guess/trial-and-error.  I did manage to
> configure my own data source and changed my queries to apply to my own
> data, but I did something wrong somewhere in solrconfig.xml because I get
> errors when running, now.  I solved some of them by copying the *.jar files
> from the ./dist directory to the solr/lib directory (a tip I found when I
> googled the error message), but that only helped to a certain point.
>
> I will post more specific questions about my issues when I have a chance
> to re-investigate that (hopefully later today).
>
> I have *not* found specific Java code examples using Solr yet, but I
> haven't exhausted exploring the Solr website yet.  Hopefully, I'll find
> some examples using Solr in Java code...
>

https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
I think the essential parts are covered there.


>
> Mark
>
>
> On 9/2/2015 9:51 PM, Chris Hostetter wrote:
>
>> : various $HOME/solr-5.3.0 subdirectories.  The documents/tutorials say
>> to edit
>> : the solrconfig.xml file for various configuration details, but they
>> never say
>> : which one of these dozen to edit.  Moreover, I cannot determine which
>> version
>>
>> can you please give us a specific examples (ie: urls, page numbers &
>> version of the ref guide, etc...) of documentation that tell you to edit
>> the solrconfig.xml w/o being explicit about where to to find it so that we
>> can fix the docs?
>>
>> FWIW: The official "Quick Start" tutorial does not mention editing
>> solrconfig.xml at all...
>>
>> http://lucene.apache.org/solr/quickstart.html
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-04 Thread Renee Sun
Thanks a lot Shawn, for the details, it is very helpful !






--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227274.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr DIH sub entity

2015-09-04 Thread Mikhail Khludnev
Hello Irina,

I looked through DIH sources, it seems like you exceed its' design. Such
leakages are not possible in it. I can only suggest to call private method
through reflection.
org.apache.solr.handler.dataimport.ContextImpl.getDocument().
Perhaps, you can pass some state for Transformers via
Context.setSessionAttribute(String, Object, Context.SCOPE_DOC).
Nevertheless, it sounds like deeper level of customization see
FieldMutatingUpdateProcessorFactory
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors


On Fri, Sep 4, 2015 at 12:49 PM, vatuska  wrote:

> Hello. I work with Solr 4.10.
> I use DIH and some custom java Transformers to synchronize my Solr index
> with the database (MySQL is used)
> Is there any way to change the fields in root entity from the sub entity?
> I mean something like this
>
> public class MySubEntityTransformer extends
> org.apache.solr.handler.dataimport.Transformer {
> @Override
> public Object transformRow(Map row, Context context) {
> Map parentRow = context.getParentContext().?
> Object val = row.get("subEnityColumn");
> if (val != null) {
> //transform this value some way
> Object generalValue = parentRow.get("documentField");
> if (generalValue != null) {
>   parentRow.put("documentField", generalValue + "_" + val);
> } else {
>parentRow.put("documentField", val);
> }
> }
> return row;
> }
> }
>
> Or is there any way to apply some kind of the transformation for the root
> entity after all its subentities has been processed?
>
> I can't use the script transformers, because they works many times slower
> than java extensions.
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-DIH-sub-entity-tp4227167.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Chris Hostetter
: Strange enough, the following code gives different errors:
: 
: assertQ(

I'm not sure what exactly assertQ will do in a distributed test like this 
... probably nothing good.   you'll almost certainly want to stick with 
the distributed indexDoc() and query* methods and avoid assertU and 
assertQ


: [TEST-TestComponent.test-seed#[EA2ED1E118114486]] ERROR 
org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: 
xpath=//result/doc[1]/str[@name='id'][.='1']
: xml response was: 
...
: 
: 

...i'm guessing that's because assertQ is (probably) querying the "local" 
core from the TestHarness, not any of the distributed cores setup by 
BaseDistributedSearchTestCase and your docs didn't get indexed there.

: And, when i forcfully add distrib=true, i get a NPE in SearchHandler!

which is probably becaues since you (manually) added the debug param, but 
didn't add a list of shards to query, you triggered some slopy code in 
SearchHandler that should be giving you a nice error about shards not 
being specified.  (i bet you can manually reproduce this in a single-node 
solr setup by adding distrib=true to any query thta doesn't have a 
"shards" param, if so please file a bug that it should produce a sane 
error message)

if you use something like BaseDistributedSearchTestCase.query on the other 
hand, it takes care of adding hte correct distrib related request 
params for the shards it creates under the covers.

(allthough at this point, in general, i would strongly suggest that you 
instead consider using AbstractFullDistribZkTestBase instead of 
BaseDistributedSearchTestCase -- assuming of course that your goal is good 
tests of how some distributed queries behave in a modern solr cloud setup.  
if your goal is to test solr under manual sharding/distributed queries, 
BaseDistributedSearchTestCase still makes sense.)


As to your first question (which applies to both old school and 
cloud/zk related tests)...

: > Executing the above text either results in a: IOException occured when 
talking to server at: https://127.0.0.1:44761//collection1

That might be due ot a timing issue of the servers not completley starting 
up before you start sending requests to them? not really sure ... would 
need to see the logs.

: > Or it fails with a curous error: .response.maxScore:1.0!=null
: > 
: > The score correctly changes according to whatever value i set for parameter 
q.

that has to do with teh way the BaseDistributedSearchTestCase plumbing 
tries to help ensure that a distribute query returns the same results as a 
single shard query by "diffing" the responses (note: this is why 
BaseDistributedSearchTestCase.indexDoc adds your doc to both a random 
shard *and* to a "control collection").  But there are some legacy quirks 
about how things like "maxScore" are handled: notably SOLR-6612 
(historically, because of the possibility of filter optimizations, solr 
only kept track of the scores if it needed to.  in single core, this was 
if you asked for "fl=score,..." but in a distributed query it might also 
compute scores (and maxScore) if you are sorting on scores (which is the 
default)

they way to indicate that you don't want BaseDistributedSearchTestCase's 
response diff checking to freak out over the max score is using the 
(horribly undocumented) "handle" feature...

handle.put("maxScore", SKIPVAL);

...that's not the default in all tests because it could hide errors in 
situations where tests *are* expecting the maxScore to be the same.


the same mechanism can be used to ignore things like the _version_ 
field, or timestamp fields which are virtually garunteed not to be the 
same between two differnet collections.  (see uses of the "handle" Map in 
existing test cases for examples).



-Hoss
http://www.lucidworks.com/


Re: Order of hosts in zkHost

2015-09-04 Thread Arcadius Ahouansou
Hello Shawn.
This question was raised because IMHO, apart from leader election, there
are other load-generating activities such as all 10 solrj
clients+solrCloudNodes listening to changes on clusterstate.json/state.json
and downloading the whole file in case there is a change... And this would
have  happened on zk1 only if we did not shuffle... That's the theory.
I could test this and see.
On Sep 4, 2015 6:27 AM, "Shawn Heisey"  wrote:

> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
> > Let's say we have 10 SolrJ clients all configured with
> > zkhost=zk1:port,zk2:port,zk3:port
> >
> > For each of the 10 SolrJ clients, would it make a difference in term of
> > load on zk1 (the server on the list) if we shuffle around the order of
> the
> > ZK servers in zkHost or is it all the same?
> >
> > I would have thought that shuffling would lower load on zk1.
>
> I don't think this is going to make much difference.  Here's why,
> assuming that my understanding of how it all works is correct:
>
> One of the things zookeeper does is manage elections.  It helps figure
> out which member of a cluster is the leader.  I think Zookeeper uses
> this concept internally, too.  One of the hosts in the ensemble will be
> elected to be the leader, which accepts all input and replicates it to
> the other members of the cluster.  All of the clients will be talking to
> the leader first, no matter what order the hosts are listed.
>
> If my understanding of how this works is flawed, then what I just said
> is probably wrong.
>
> Thanks,
> Shawn
>
>


Queries on De-Duplication

2015-09-04 Thread Zheng Lin Edwin Yeo
Hi,

I'm trying out on the De-Duplication.I've tried to create a new signature
field in schema.xml


I've also added the following in solrconfig.xml.


 
true
signature
false
content
solr.processor.Lookup3Signature
 






However, I can't do a copyField of content into this signature field as
some of my contents are more than 32766 characters in length. Previously, I
tried to point the signatureField directly to content. but that is not
working too.

Anything else that I can do to do a group on a new signatureField?


Regards,
Edwin


Re: Stemming words Using Solr

2015-09-04 Thread Upayavira
Yes, look at the one I mentioned further up in this thread, which is a
part of SolrJ: FieldAnalysisRequest

That uses the same HTTP call in the backend, but formats the result in a
Java friendly manner.

Upayavira

On Fri, Sep 4, 2015, at 05:52 AM, Ritesh Sinha wrote:
> Yeah, I got. Thanks.
> 
> It returns a json which have the stemmed words.I just need to parse it
> and
> get the value.
> 
> But, isn't there any JAVA API available for it ?
> 
> On Thu, Sep 3, 2015 at 7:58 PM, Upayavira  wrote:
> 
> > yes, the URL should be something like:
> >
> >
> > http://localhost:8983/solr/images/analysis/field?wt=json=true=
> > =
> >
> > Upayavira
> >
> > On Thu, Sep 3, 2015, at 03:23 PM, Jack Krupansky wrote:
> > > The # in the URL says to send the request to the admin UI, which of
> > > course
> > > returns an HTML web page. Instead, send the analysis URL fragment
> > > directly
> > > to the analysis API (not UI) for the Solr core, without the #.
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Sep 3, 2015 at 8:45 AM, Ritesh Sinha <
> > > kumarriteshranjansi...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I observed the inspect element and wrote a code to give back the
> > content. I
> > > > have included the url which was getting generated.
> > > >
> > > > public class URLConnectionReader {
> > > > public static void main(String[] args) throws Exception {
> > > > URL solr = new URL(
> > > > "
> > > >
> > > >
> > http://localhost:8983/solr/#/testcore/analysis?analysis.fieldvalue=holidays=inferlytics_output=1
> > > > ");
> > > > URLConnection sl = solr.openConnection();
> > > > BufferedReader in = new BufferedReader(new InputStreamReader(
> > > > sl.getInputStream()));
> > > > String inputLine;
> > > >
> > > > while ((inputLine = in.readLine()) != null)
> > > > System.out.println(inputLine);
> > > > in.close();
> > > > }
> > > > }
> > > >
> > > >
> > > > But it shows this in the consloe :
> > > >
> > > >  > > > http://www.w3.org/TR/html4/strict.dtd;>
> > > > 
> > > >
> > > > 
> > > >
> > > > 
> > > >
> > > >   Solr Admin
> > > >
> > > >   
> > > >> > > href="img/favicon.ico?_=5.3.0">
> > > >
> > > >> > > href="css/styles/common.css?_=5.3.0">
> > > >> > > href="css/styles/analysis.css?_=5.3.0">
> > > >> > > href="css/styles/cloud.css?_=5.3.0">
> > > >> > > href="css/styles/cores.css?_=5.3.0">
> > > >> > > href="css/styles/dashboard.css?_=5.3.0">
> > > >> > > href="css/styles/dataimport.css?_=5.3.0">
> > > >> > > href="css/styles/files.css?_=5.3.0">
> > > >> > > href="css/styles/index.css?_=5.3.0">
> > > >> > > href="css/styles/java-properties.css?_=5.3.0">
> > > >> > > href="css/styles/logging.css?_=5.3.0">
> > > >> > > href="css/styles/menu.css?_=5.3.0">
> > > >> > > href="css/styles/plugins.css?_=5.3.0">
> > > >> > > href="css/styles/documents.css?_=5.3.0">
> > > >> > > href="css/styles/query.css?_=5.3.0">
> > > >> > > href="css/styles/replication.css?_=5.3.0">
> > > >> > > href="css/styles/schema-browser.css?_=5.3.0">
> > > >> > > href="css/styles/threads.css?_=5.3.0">
> > > >> > > href="css/styles/segments.css?_=5.3.0">
> > > >   
> > > >
> > > >   
> > > >
> > > >   
> > > >
> > > >   var app_config = {};
> > > >
> > > >   app_config.solr_path = '\/solr';
> > > >   app_config.core_admin_path = '\/admin\/cores';
> > > >
> > > >   
> > > >
> > > > 
> > > > 
> > > >
> > > >   
> > > >
> > > > 
> > > >
> > > >   Apache SOLR
> > > >
> > > >   
> > > >
> > > > 
> > > >
> > > > 
> > > >
> > > >   
> > > >
> > > >   SolrCore Initialization Failures
> > > >   
> > > >   Please check your logs for more information
> > > >
> > > >   
> > > >
> > > >   
> > > > 
> > > >
> > > >   
> > > >
> > > > 
> > > >   
> > > >
> > > >   
> > > > 
> > > >
> > > >   
> > > >
> > > >  > > > href="#/">Dashboard
> > > >
> > > >  > > > href="#/~logging">Logging
> > > >   
> > > >  > href="#/~logging/level">Level
> > > >   
> > > > 
> > > >
> > > >  > > > href="#/~cloud">Cloud
> > > >   
> > > >  > href="#/~cloud?view=tree">Tree
> > > > Graph
> > > > Graph
> > > > (Radial)
> > > > Dump
> > > >   
> > > > 
> > > >
> > > > Core
> > > > Admin
> > > >
> > > >  > > > href="#/~java-properties">Java Properties
> > > >
> > > >  > href="#/~threads">Thread
> > > > Dump
> > > >
> > > >   
> > > >
> > > >   
> > > > 
> > > >   
> > > > 
> > > > 
> > > >   No cores available
> > > >   Go and create one
> > > > 
> > > >   
> > > 

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Kevin Lee
Thanks, I downloaded the source and compiled it and replaced the jar file in 
the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be 
protecting the Collections API reload command now as long as I upload the 
security.json after startup of the Solr instances.  If I shutdown and bring the 
instances back up, the security is no longer in place and I have to upload the 
security.json again for it to take effect.

- Kevin

> On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
> 
> Both these are committed. If you could test with the latest 5.3 branch
> it would be helpful
> 
> On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
>> I opened a ticket for the same
>> https://issues.apache.org/jira/browse/SOLR-8004
>> 
>> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee  wrote:
>>> I’ve found that completely exiting Chrome or Firefox and opening it back up 
>>> re-prompts for credentials when they are required.  It was re-prompting 
>>> with the /browse path where authentication was working each time I 
>>> completely exited and started the browser again, however it won’t re-prompt 
>>> unless you exit completely and close all running instances so I closed all 
>>> instances each time to test.
>>> 
>>> However, to make sure I ran it via the command line via curl as suggested 
>>> and it still does not give any authentication error when trying to issue 
>>> the command via curl.  I get a success response from all the Solr instances 
>>> that the reload was successful.
>>> 
>>> Not sure why the pre-canned permissions aren’t working, but the one to the 
>>> request handler at the /browse path is.
>>> 
>>> 
 On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
 
 " However, after uploading the new security.json and restarting the
 web browser,"
 
 The browser remembers your login , So it is unlikely to prompt for the
 credentials again.
 
 Why don't you try the RELOAD operation using command line (curl) ?
 
 On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  
 wrote:
> The restart issues aside, I’m trying to lockdown usage of the Collections 
> API, but that also does not seem to be working either.
> 
> Here is my security.json.  I’m using the “collection-admin-edit” 
> permission and assigning it to the “adminRole”.  However, after uploading 
> the new security.json and restarting the web browser, it doesn’t seem to 
> be requiring credentials when calling the RELOAD action on the 
> Collections API.  The only thing that seems to work is the custom 
> permission “browse” which is requiring authentication before allowing me 
> to pull up the page.  Am I using the permissions correctly for the 
> RuleBasedAuthorizationPlugin?
> 
> {
>   "authentication":{
>  "class":"solr.BasicAuthPlugin",
>  "credentials": {
>   "admin”:” ",
>   "user": ” "
>   }
>   },
>   "authorization":{
>  "class":"solr.RuleBasedAuthorizationPlugin",
>  "permissions": [
>   {
>   "name":"security-edit",
>   "role":"adminRole"
>   },
>   {
>   "name":"collection-admin-edit”,
>   "role":"adminRole"
>   },
>   {
>   "name":"browse",
>   "collection": "inventory",
>   "path": "/browse",
>   "role":"browseRole"
>   }
>   ],
>  "user-role": {
>   "admin": [
>   "adminRole",
>   "browseRole"
>   ],
>   "user": [
>   "browseRole"
>   ]
>   }
>   }
> }
> 
> Also tried adding the permission using the Authorization API, but no 
> effect, still isn’t protecting the Collections API from being invoked 
> without a username password.  I do see in the Solr logs that it sees the 
> updates because it outputs the messages “Updating /security.json …”, 
> “Security node changed”, “Initializing authorization plugin: 
> solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class 
> obtained from ZK: solr.BasicAuthPlugin”.
> 
> Thanks,
> Kevin
> 
>> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
>> 
>> I'm investigating why restarts or first time start does not read the
>> security.json
>>