Re: The most efficient way to get un-inverted view of the index?
in case this helps someone, here is a solution (probably very efficient already, but i didn't profile it); it can deal with DocValues and with FieldCache (the old 'stored' values) private void unInvertedTheDamnThing( SolrIndexSearcher searcher, List fields, KVSetter setter) throws IOException { LeafReader reader = searcher.getLeafReader(); IndexSchema schema = searcher.getCore().getLatestSchema(); List leaves = reader.getContext().leaves(); Bits liveDocs; LeafReader lr; Transformer transformer; for (LeafReaderContext leave: leaves) { int docBase = leave.docBase; liveDocs = leave.reader().getLiveDocs(); lr = leave.reader(); FieldInfos fInfo = lr.getFieldInfos(); for (String field: fields) { FieldInfo fi = fInfo.fieldInfo(field); SchemaField fSchema = schema.getField(field); DocValuesType fType = fi.getDocValuesType(); Mapmapping = new HashMap (); final LeafReader unReader; if (fType.equals(DocValuesType.NONE)) { Class c = fType.getClass(); if (c.isAssignableFrom(TextField.class) || c.isAssignableFrom(StrField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED); } else { mapping.put(field, Type.BINARY); } } else if (c.isAssignableFrom(TrieIntField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED_SET_INTEGER); } else { mapping.put(field, Type.INTEGER_POINT); } } else { continue; } unReader = new UninvertingReader(lr, mapping); } else { unReader = lr; } switch(fType) { case NUMERIC: transformer = new Transformer() { NumericDocValues dv = unReader.getNumericDocValues(field); @Override public void process(int docBase, int docId) { int v = (int) dv.get(docId); setter.set(docBase, docId, v); } }; break; case SORTED_NUMERIC: transformer = new Transformer() { SortedNumericDocValues dv = unReader.getSortedNumericDocValues(field); @Override public void process(int docBase, int docId) { dv.setDocument(docId); int max = dv.count(); int v; for (int i=0; i 5) return; dv.setDocument(docId); for (long ord = dv.nextOrd(); ord != SortedSetDocValues.NO_MORE_ORDS; ord = dv.nextOrd()) { final BytesRef value = dv.lookupOrd(ord); setter.set(docBase, docId, value.utf8ToString()); } } }; break; case SORTED: transformer = new Transformer() { SortedDocValues dv = unReader.getSortedDocValues(field); TermsEnum te; @Override public void process(int docBase, int docId) { BytesRef v = dv.get(docId); if (v.length == 0) return; setter.set(docBase, docId, v.utf8ToString()); } }; break; default: throw new IllegalArgumentException("The field " + field + " is of type that cannot be un-inverted"); } int i = 0; while(i < lr.maxDoc()) { if (liveDocs != null && !(i < liveDocs.length() && liveDocs.get(i))) { i++; continue; } transformer.process(docBase, i); i++; } } } } On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla wrote: > Joel, thanks, but which of them? I've counted at least 4, if not more, > different ways of how to get DocValues. Are there many functionally > equal approaches just because devs can't agree on using one api? Or is > there a deeper reason? > > Btw, the FieldCache is still there - both in lucene (to be deprecated) > and in solr; but became package accessible only > > This is what removed the FieldCache: > https://issues.apache.org/jira/browse/LUCENE-5666 > This is what followed: https://issues.apache.org/jira/browse/SOLR-8096 > > And there is still code which un-inverts data from an index if no > doc-values are available. > > --roman > > On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein
Re: Using Solr invariants to set facet method?
Thanks for your reply. I was not seeing the param being added in return results. but after adding echoParams=true, I see that facet method is being added. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-invariants-to-set-facet-method-tp4292142p4292149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error During Indexing - org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF
>From my testing program, there's nothing standard here. As the blog points out, since I was indexing fairly simple documents you should _not_ be expecting to see those indexing rates. The point of the article was just to show the _relative_ changes when I sent batches. Best, Erick On Wed, Aug 17, 2016 at 1:59 PM, Jaspal Sawhneywrote: > Erick > Going through the article which you shared. Where are you getting the > Docs/second value? > Thanks > > On 8/17/16, 4:37 PM, "Jaspal Sawhney" wrote: > >>Erick >>Thanks - My batch size was 30 and thread size also 30. >>Thanks >> >>On 8/17/16, 3:48 PM, "Erick Erickson" wrote: >> >>>What this probably indicates is that the size of the packets you send >>>to Solr is large enough that it exceeds the transport protocol's >>>limit. This is reinforced by your statement that reducing the batch >>>size fixes the problem even though it increases indexing time. >>> >>>So the place I'd be looking is the jetty configurations for any limits >>>there. >>> >>>That said, what is your batch size? In my testing I pretty quickly get >>>into diminishing returns, here's a writeup from some time ago: >>>https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ >>> >>>Best, >>>Erick >>> >>>On Wed, Aug 17, 2016 at 12:03 PM, Jaspal Sawhney >>>wrote: Bump ! On 8/16/16, 10:53 PM, "Jaspal Sawhney" wrote: >Hello >We are running solr 4.6 in master-slave configuration where in our >master >is used entirely for indexing. No search traffic comes to master ever. >Off late we have started to get the early EOF error on the solr Master >which results in a Broken Pipe error on the commerce application from >where Indexing was kicked off from. > >Things to mention > > 1. We have a couple of sites each of which has the same document >size but diff document count. > 2. This error is being observed in the site which has the most >number >of document count I.e. 2204743 > 3. The way I have understood solr to work is that irrespective of >number of document the throughput is controlled by the ŒNumber of >Threads¹ and ŒBatch size¹ - Am I correct? > * In our case we have not touched the batch size and Number of >Threads when this error started coming > * However when I do touch these parameters (specifically reduce >them) the error does not come however indexing time increases a lot. > 4. We have to index overnight daily because we put product prices in >the Index which get updated nightly > 5. Solr master is running with a 20 GB Heap > >What we have tried > > 1. I disabled autoCommit (I.e. Hard commit) and put the >autoSoftCommit >as 5 mins > * I realized afterwards that this was a wrong test because my >understanding of soft commit was incorrect, My understanding now is >that >hard commit just truncate the Tlog do hardCommit should be better >indexing performance. > * This test failed for lack of space reason however because >disable autoCommit did not make sense I did not retry this test yet. > 2. Increased the RAMBufferSizeMB from 100MB to 1000MB > * This test did not yield anything favorable the master gave >the >early EOF exception > 3. Increased the merge factor from 20 ‹> 100 > * This test did not yield anything favorable the master gave >the >early EOF exception > 4. Flipped the autoCommit to 15 secs and disabled auto commit > * This test did not yield anything favorable the master gave >the >early EOF exception > * I got the input for this from >https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-s >o >ft >commit-and-commit-in-sorlcloud/ - Heavy (Bulk) Indexing section > 5. Tried to bypass transaction log all together This test is >underway currently > >Questions > > 1. Since we are not using solrCloud I want to understand the >impact >of bypassing transaction log > 2. How does solr take documents which are sent to it to storage as >in >what is the journey of a document from segment to tlog to storage. > >It would be great If there are any pointers which you can share. > >Thanks >J./ > >The actual Error Log >ERROR - 2016-08-16 22:59:55.988; org.apache.solr.common.SolrException; >org.apache.solr.common.SolrException: early EOF >at >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) >at >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandle >r >.j >ava:92) >at >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Cont >e >nt
Re: Using Solr invariants to set facet method?
Setting the facet method to enum will have consequences for the filterCache, especially if you allow faceting on high-cardinality fields so for that specific example I'd be cautious. Best, Erick On Wed, Aug 17, 2016 at 3:01 PM, Alexandre Rafalovitchwrote: > That's what it is there for. Are you seeing any issues? > > You can confirm whether it works or not by adding echoParams=all to > the query (or in the defaults/invariants). > > Regards, >Alex > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 18 August 2016 at 07:43, ruby wrote: >> Is it possible to use the invariants in Solr config to set facet.method to >> override what user is sending? >> >> >> enum >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Using-Solr-invariants-to-set-facet-method-tp4292142.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr invariants to set facet method?
That's what it is there for. Are you seeing any issues? You can confirm whether it works or not by adding echoParams=all to the query (or in the defaults/invariants). Regards, Alex Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 18 August 2016 at 07:43, rubywrote: > Is it possible to use the invariants in Solr config to set facet.method to > override what user is sending? > > > enum > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-Solr-invariants-to-set-facet-method-tp4292142.html > Sent from the Solr - User mailing list archive at Nabble.com.
Using Solr invariants to set facet method?
Is it possible to use the invariants in Solr config to set facet.method to override what user is sending? enum -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-invariants-to-set-facet-method-tp4292142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error During Indexing - org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF
Erick Going through the article which you shared. Where are you getting the Docs/second value? Thanks On 8/17/16, 4:37 PM, "Jaspal Sawhney"wrote: >Erick >Thanks - My batch size was 30 and thread size also 30. >Thanks > >On 8/17/16, 3:48 PM, "Erick Erickson" wrote: > >>What this probably indicates is that the size of the packets you send >>to Solr is large enough that it exceeds the transport protocol's >>limit. This is reinforced by your statement that reducing the batch >>size fixes the problem even though it increases indexing time. >> >>So the place I'd be looking is the jetty configurations for any limits >>there. >> >>That said, what is your batch size? In my testing I pretty quickly get >>into diminishing returns, here's a writeup from some time ago: >>https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ >> >>Best, >>Erick >> >>On Wed, Aug 17, 2016 at 12:03 PM, Jaspal Sawhney >>wrote: >>> Bump ! >>> >>> On 8/16/16, 10:53 PM, "Jaspal Sawhney" wrote: >>> Hello We are running solr 4.6 in master-slave configuration where in our master is used entirely for indexing. No search traffic comes to master ever. Off late we have started to get the early EOF error on the solr Master which results in a Broken Pipe error on the commerce application from where Indexing was kicked off from. Things to mention 1. We have a couple of sites each of which has the same document size but diff document count. 2. This error is being observed in the site which has the most number of document count I.e. 2204743 3. The way I have understood solr to work is that irrespective of number of document the throughput is controlled by the ŒNumber of Threads¹ and ŒBatch size¹ - Am I correct? * In our case we have not touched the batch size and Number of Threads when this error started coming * However when I do touch these parameters (specifically reduce them) the error does not come however indexing time increases a lot. 4. We have to index overnight daily because we put product prices in the Index which get updated nightly 5. Solr master is running with a 20 GB Heap What we have tried 1. I disabled autoCommit (I.e. Hard commit) and put the autoSoftCommit as 5 mins * I realized afterwards that this was a wrong test because my understanding of soft commit was incorrect, My understanding now is that hard commit just truncate the Tlog do hardCommit should be better indexing performance. * This test failed for lack of space reason however because disable autoCommit did not make sense I did not retry this test yet. 2. Increased the RAMBufferSizeMB from 100MB to 1000MB * This test did not yield anything favorable the master gave the early EOF exception 3. Increased the merge factor from 20 ‹> 100 * This test did not yield anything favorable the master gave the early EOF exception 4. Flipped the autoCommit to 15 secs and disabled auto commit * This test did not yield anything favorable the master gave the early EOF exception * I got the input for this from https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-s o ft commit-and-commit-in-sorlcloud/ - Heavy (Bulk) Indexing section 5. Tried to bypass transaction log all together This test is underway currently Questions 1. Since we are not using solrCloud I want to understand the impact of bypassing transaction log 2. How does solr take documents which are sent to it to storage as in what is the journey of a document from segment to tlog to storage. It would be great If there are any pointers which you can share. Thanks J./ The actual Error Log ERROR - 2016-08-16 22:59:55.988; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandle r .j ava:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Cont e nt StreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandler B as e.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.j a va :721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter. j av a:417) at
Re: Error During Indexing - org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF
Erick Thanks - My batch size was 30 and thread size also 30. Thanks On 8/17/16, 3:48 PM, "Erick Erickson"wrote: >What this probably indicates is that the size of the packets you send >to Solr is large enough that it exceeds the transport protocol's >limit. This is reinforced by your statement that reducing the batch >size fixes the problem even though it increases indexing time. > >So the place I'd be looking is the jetty configurations for any limits >there. > >That said, what is your batch size? In my testing I pretty quickly get >into diminishing returns, here's a writeup from some time ago: >https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ > >Best, >Erick > >On Wed, Aug 17, 2016 at 12:03 PM, Jaspal Sawhney >wrote: >> Bump ! >> >> On 8/16/16, 10:53 PM, "Jaspal Sawhney" wrote: >> >>>Hello >>>We are running solr 4.6 in master-slave configuration where in our >>>master >>>is used entirely for indexing. No search traffic comes to master ever. >>>Off late we have started to get the early EOF error on the solr Master >>>which results in a Broken Pipe error on the commerce application from >>>where Indexing was kicked off from. >>> >>>Things to mention >>> >>> 1. We have a couple of sites each of which has the same document >>>size but diff document count. >>> 2. This error is being observed in the site which has the most number >>>of document count I.e. 2204743 >>> 3. The way I have understood solr to work is that irrespective of >>>number of document the throughput is controlled by the ŒNumber of >>>Threads¹ and ŒBatch size¹ - Am I correct? >>> * In our case we have not touched the batch size and Number of >>>Threads when this error started coming >>> * However when I do touch these parameters (specifically reduce >>>them) the error does not come however indexing time increases a lot. >>> 4. We have to index overnight daily because we put product prices in >>>the Index which get updated nightly >>> 5. Solr master is running with a 20 GB Heap >>> >>>What we have tried >>> >>> 1. I disabled autoCommit (I.e. Hard commit) and put the >>>autoSoftCommit >>>as 5 mins >>> * I realized afterwards that this was a wrong test because my >>>understanding of soft commit was incorrect, My understanding now is that >>>hard commit just truncate the Tlog do hardCommit should be better >>>indexing performance. >>> * This test failed for lack of space reason however because >>>disable autoCommit did not make sense I did not retry this test yet. >>> 2. Increased the RAMBufferSizeMB from 100MB to 1000MB >>> * This test did not yield anything favorable the master gave >>>the >>>early EOF exception >>> 3. Increased the merge factor from 20 ‹> 100 >>> * This test did not yield anything favorable the master gave >>>the >>>early EOF exception >>> 4. Flipped the autoCommit to 15 secs and disabled auto commit >>> * This test did not yield anything favorable the master gave >>>the >>>early EOF exception >>> * I got the input for this from >>>https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-so >>>ft >>>commit-and-commit-in-sorlcloud/ - Heavy (Bulk) Indexing section >>> 5. Tried to bypass transaction log all together This test is >>>underway currently >>> >>>Questions >>> >>> 1. Since we are not using solrCloud I want to understand the impact >>>of bypassing transaction log >>> 2. How does solr take documents which are sent to it to storage as in >>>what is the journey of a document from segment to tlog to storage. >>> >>>It would be great If there are any pointers which you can share. >>> >>>Thanks >>>J./ >>> >>>The actual Error Log >>>ERROR - 2016-08-16 22:59:55.988; org.apache.solr.common.SolrException; >>>org.apache.solr.common.SolrException: early EOF >>>at >>>org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) >>>at >>>org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler >>>.j >>>ava:92) >>>at >>>org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte >>>nt >>>StreamHandlerBase.java:74) >>>at >>>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB >>>as >>>e.java:135) >>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) >>>at >>>org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja >>>va >>>:721) >>>at >>>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j >>>av >>>a:417) >>>at >>>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j >>>av >>>a:201) >>>at >>>org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan >>>dl >>>er.java:1419) >>>at >>>org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:45 >>>5) >>>at
Unit testing HttpPost With an Embedded Solr Server
Hello, I have written a data service to send an HttpPost command to post JSON to Solr. The code is working, but now I want to switch to using an embedded Solr server for just the unit tests. The problem is that the embedded Solr server doesn't seem to be starting an embedded server with a port. So I'm at a loss on how to test this. I guess I have two questions. (1) How do I unit test my post command with an embedded Solr server? (2) If it isn't possible to use the embedded Solr server, I believe I read somewhere that Solr uses a Jetty server. Is it possible to convert an embedded jetty server (with a port I can access) to a Solr server? Here is the class I am trying to test: public class SolrDataServiceClient { private String urlString; private HttpClient httpClient; private final Logger LOGGER = LoggerFactory.getLogger (SolrDataServiceClient.class); /** * Constructor for connecting to the Solr Server * @param solrCore * @param serverName * @param portNumber */ public SolrDataServiceClient(String solrCore, String serverName, String portNumber){ LOGGER.info("Initializing new Http Client to Connect To Solr"); urlString = serverName + ":" + portNumber + "/solr/" + solrCore ; if(httpClient == null){ httpClient = new HttpClient(); } } /** * Post the provided JSON to Solr */ public CloseableHttpResponse postJSON(String jsonToAdd) { CloseableHttpResponse response = null; try { CloseableHttpClient client = HttpClients.createDefault(); HttpPost httpPost = new HttpPost(urlString + "/update/json/docs"); HttpEntity entity = new ByteArrayEntity(jsonToAdd .getBytes("UTF-8")); httpPost.setEntity(entity); httpPost.setHeader("Content-type", "application/json"); LOGGER.debug("httpPost = " + httpPost.toString()); response = client.execute(httpPost); String result = EntityUtils.toString(response.getEntity ()); LOGGER.debug("result = " + result); client.close(); } catch (IOException e) { LOGGER.error("IOException", e); } return response; } Here is my JUnit test: public class SolrDataServiceClientTest { private static EmbeddedSolrServer embeddedServer; private static SolrDataServiceClient solrDataServiceClient; @BeforeClass public static void setUpBeforeClass() throws Exception { System.setProperty("solr.solr.home", "solr/conf"); System.setProperty("solr.data.dir", new File( "target/solr-embedded-data").getAbsolutePath()); CoreContainer coreContainer = new CoreContainer("solr/conf"); coreContainer.load(); CoreDescriptor cd = new CoreDescriptor(coreContainer, "myCoreName", new File("solr").getAbsolutePath()); coreContainer.create(cd); embeddedServer = new EmbeddedSolrServer(coreContainer, "myCoreName"); solrDataServiceClient = new SolrDataServiceClient("myCoreName", "http://localhost;, "8983"); //I'm not sure what should go here } @Test public void testPostJson() { String testJson = " { " + "\"observationId\": \"12345c\"," + "\"observationType\": \"image\"," + "\"locationLat\": 38.9215," + "\"locationLon\": -77.235" + "}"; CloseableHttpResponse response = solrDataServiceClient.postJSON( testJson); assertEquals(response.getStatusLine().getStatusCode(), 200); } Thank you! Jennifer
Re: Error During Indexing - org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF
What this probably indicates is that the size of the packets you send to Solr is large enough that it exceeds the transport protocol's limit. This is reinforced by your statement that reducing the batch size fixes the problem even though it increases indexing time. So the place I'd be looking is the jetty configurations for any limits there. That said, what is your batch size? In my testing I pretty quickly get into diminishing returns, here's a writeup from some time ago: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ Best, Erick On Wed, Aug 17, 2016 at 12:03 PM, Jaspal Sawhneywrote: > Bump ! > > On 8/16/16, 10:53 PM, "Jaspal Sawhney" wrote: > >>Hello >>We are running solr 4.6 in master-slave configuration where in our master >>is used entirely for indexing. No search traffic comes to master ever. >>Off late we have started to get the early EOF error on the solr Master >>which results in a Broken Pipe error on the commerce application from >>where Indexing was kicked off from. >> >>Things to mention >> >> 1. We have a couple of sites each of which has the same document >>size but diff document count. >> 2. This error is being observed in the site which has the most number >>of document count I.e. 2204743 >> 3. The way I have understood solr to work is that irrespective of >>number of document the throughput is controlled by the ŒNumber of >>Threads¹ and ŒBatch size¹ - Am I correct? >> * In our case we have not touched the batch size and Number of >>Threads when this error started coming >> * However when I do touch these parameters (specifically reduce >>them) the error does not come however indexing time increases a lot. >> 4. We have to index overnight daily because we put product prices in >>the Index which get updated nightly >> 5. Solr master is running with a 20 GB Heap >> >>What we have tried >> >> 1. I disabled autoCommit (I.e. Hard commit) and put the autoSoftCommit >>as 5 mins >> * I realized afterwards that this was a wrong test because my >>understanding of soft commit was incorrect, My understanding now is that >>hard commit just truncate the Tlog do hardCommit should be better >>indexing performance. >> * This test failed for lack of space reason however because >>disable autoCommit did not make sense I did not retry this test yet. >> 2. Increased the RAMBufferSizeMB from 100MB to 1000MB >> * This test did not yield anything favorable the master gave the >>early EOF exception >> 3. Increased the merge factor from 20 ‹> 100 >> * This test did not yield anything favorable the master gave the >>early EOF exception >> 4. Flipped the autoCommit to 15 secs and disabled auto commit >> * This test did not yield anything favorable the master gave the >>early EOF exception >> * I got the input for this from >>https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-soft >>commit-and-commit-in-sorlcloud/ - Heavy (Bulk) Indexing section >> 5. Tried to bypass transaction log all together This test is >>underway currently >> >>Questions >> >> 1. Since we are not using solrCloud I want to understand the impact >>of bypassing transaction log >> 2. How does solr take documents which are sent to it to storage as in >>what is the journey of a document from segment to tlog to storage. >> >>It would be great If there are any pointers which you can share. >> >>Thanks >>J./ >> >>The actual Error Log >>ERROR - 2016-08-16 22:59:55.988; org.apache.solr.common.SolrException; >>org.apache.solr.common.SolrException: early EOF >>at >>org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) >>at >>org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.j >>ava:92) >>at >>org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content >>StreamHandlerBase.java:74) >>at >>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas >>e.java:135) >>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) >>at >>org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java >>:721) >>at >>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >>a:417) >>at >>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >>a:201) >>at >>org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl >>er.java:1419) >>at >>org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) >>at >>org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >>37) >>at >>org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557 >>) >>at >>org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja >>va:231) >>at >>org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja
Re: Error During Indexing - org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: early EOF
Bump ! On 8/16/16, 10:53 PM, "Jaspal Sawhney"wrote: >Hello >We are running solr 4.6 in master-slave configuration where in our master >is used entirely for indexing. No search traffic comes to master ever. >Off late we have started to get the early EOF error on the solr Master >which results in a Broken Pipe error on the commerce application from >where Indexing was kicked off from. > >Things to mention > > 1. We have a couple of sites each of which has the same document >size but diff document count. > 2. This error is being observed in the site which has the most number >of document count I.e. 2204743 > 3. The way I have understood solr to work is that irrespective of >number of document the throughput is controlled by the ŒNumber of >Threads¹ and ŒBatch size¹ - Am I correct? > * In our case we have not touched the batch size and Number of >Threads when this error started coming > * However when I do touch these parameters (specifically reduce >them) the error does not come however indexing time increases a lot. > 4. We have to index overnight daily because we put product prices in >the Index which get updated nightly > 5. Solr master is running with a 20 GB Heap > >What we have tried > > 1. I disabled autoCommit (I.e. Hard commit) and put the autoSoftCommit >as 5 mins > * I realized afterwards that this was a wrong test because my >understanding of soft commit was incorrect, My understanding now is that >hard commit just truncate the Tlog do hardCommit should be better >indexing performance. > * This test failed for lack of space reason however because >disable autoCommit did not make sense I did not retry this test yet. > 2. Increased the RAMBufferSizeMB from 100MB to 1000MB > * This test did not yield anything favorable the master gave the >early EOF exception > 3. Increased the merge factor from 20 ‹> 100 > * This test did not yield anything favorable the master gave the >early EOF exception > 4. Flipped the autoCommit to 15 secs and disabled auto commit > * This test did not yield anything favorable the master gave the >early EOF exception > * I got the input for this from >https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-soft >commit-and-commit-in-sorlcloud/ - Heavy (Bulk) Indexing section > 5. Tried to bypass transaction log all together This test is >underway currently > >Questions > > 1. Since we are not using solrCloud I want to understand the impact >of bypassing transaction log > 2. How does solr take documents which are sent to it to storage as in >what is the journey of a document from segment to tlog to storage. > >It would be great If there are any pointers which you can share. > >Thanks >J./ > >The actual Error Log >ERROR - 2016-08-16 22:59:55.988; org.apache.solr.common.SolrException; >org.apache.solr.common.SolrException: early EOF >at >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) >at >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.j >ava:92) >at >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content >StreamHandlerBase.java:74) >at >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas >e.java:135) >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) >at >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java >:721) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:417) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:201) >at >org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl >er.java:1419) >at >org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >37) >at >org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557 >) >at >org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja >va:231) >at >org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja >va:1075) >at >org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) >at >org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav >a:193) >at >org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav >a:1009) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >35) >at >org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa >ndlerCollection.java:255) >at >org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio >n.java:154) >at >org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java >:116) >at
Re: The most efficient way to get un-inverted view of the index?
Joel, thanks, but which of them? I've counted at least 4, if not more, different ways of how to get DocValues. Are there many functionally equal approaches just because devs can't agree on using one api? Or is there a deeper reason? Btw, the FieldCache is still there - both in lucene (to be deprecated) and in solr; but became package accessible only This is what removed the FieldCache: https://issues.apache.org/jira/browse/LUCENE-5666 This is what followed: https://issues.apache.org/jira/browse/SOLR-8096 And there is still code which un-inverts data from an index if no doc-values are available. --roman On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernsteinwrote: > You'll want to use org.apache.lucene.index.DocValues. The DocValues api has > replaced the field cache. > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Aug 16, 2016 at 8:18 PM, Roman Chyla wrote: > >> I need to read data from the index in order to build a special cache. >> Previously, in SOLR4, this was accomplished with FieldCache or >> DocTermOrds >> >> Now, I'm struggling to see what API to use, there is many of them: >> >> on lucene level: >> >> UninvertingReader.getNumericDocValues (and others) >> .getNumericValues() >> MultiDocValues.getNumericValues() >> MultiFields.getTerms() >> >> on solr level: >> >> reader.getNumericValues() >> UninvertingReader.getNumericDocValues() >> and extensions to FilterLeafReader - eg. very intersting, but >> undocumented facet accumulators (ex: NumericAcc) >> >> >> I need this for solr, and ideally re-use the existing cache [ie. the >> special cache is using another fields so those get loaded only once >> and reused in the old solr; which is a win-win situation] >> >> If I use reader.getValues() or FilterLeafReader will I be reading data >> every time the object is created? What would be the best way to read >> data only once? >> >> Thanks, >> >> --roman >>
Re: [Ext] Influence ranking based on document committed date
Erick already gave you the solution, additional to that there’s a wiki page that might contain a few more things about relevancy: https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 -Stefan On August 17, 2016 at 5:35:10 PM, Erick Erickson (erickerick...@gmail.com) wrote: > Try: > recip(rord(creationDate),1,1000,1000) > > See: > https://wiki.apache.org/solr/FunctionQuery > > You can play with the magic numbers to influence how this scales your docs. > > Best, > Erick > > On Wed, Aug 17, 2016 at 7:11 AM, Jay Parashar wrote: > > This is correct: " I index it and feed it the timestamp at index time". > > You can sort desc on that field (can be a TrieDateField) > > > > > > -Original Message- > > From: Steven White [mailto:swhite4...@gmail.com] > > Sent: Wednesday, August 17, 2016 9:01 AM > > To: solr-user@lucene.apache.org > > Subject: [Ext] Influence ranking based on document committed date > > > > Hi everyone > > > > Let's say I search for the word "Olympic" and I get a hit on 10 documents > > that have similar > content (let us assume the content is at least 80% > > identical) how can I have Solr rank them so that the ones with most > > recently updated doc > gets ranked higher? Is this something I have to do at index time or search > time? > > > > Is the trick to have a field that holds the committed timestamp and boost > > on that field > during search? If so, is this field something I can configure in Solr's > schema.xml or > must I index it and feed it the timestamp at index time? If I'm on the right > track, does this > mean I have to always append this field base boost to each query a user > issues? > > > > If there is a wiki or article written on this topic, that would be a good > > start. > > > > In case it matters, I'm using Solr 5.2 and my searches are utilizing > > edismax. > > > > Thanks in advanced! > > > > Steve >
Re: Increasing filterCache size and Java Heap size
Hi Toke, Thanks for the explanation. I will prefer the memory-based limit too. At first I got confused with that too, thinking that the setting of 2000 means 2GB. Regards, Edwin On 17 August 2016 at 17:40, Toke Eskildsenwrote: > On Wed, 2016-08-17 at 11:02 +0800, Zheng Lin Edwin Yeo wrote: > > Would like to check, do I need to increase my Java Heap size for > > Solr, if I plan to increase my filterCache size in solrconfig.xml? > > > > I'm using Solr 6.1.0 > > It _seems_ that you can specify a limit in megabytes when using > LRUCache in Solr 5.2+: https://issues.apache.org/jira/browse/SOLR-7372 > > The documentation only mentions it for queryResultCache, but I do not > know if that is intentional (i.e. it does not work for filterCache) or > a shortcoming of the documentation: > https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+Solr > Config > > If it does work for filterCache too (using LRUCache, I guess), then > that would be a much better way of limiting cache size than the highly > insufficient count-based limiter. > > > I say "highly insufficient" because filter cache entries are not of > equal size. With small sets they are stored as sparse, using a > relatively small amount of memory. For larger sets they are stored as > bitmaps, taking up ~1K + maxdoc/8 bytes as Erick describes. > > So a fixed upper limit measured in counts needs to be adjusted to worst > case, meaning maxdoc/8, to ensure stability. In reality most of the > filter cache entries are small, meaning that there is plenty of heap > not being used. This leads people to over-allocate the max size for the > filterCache (very understandable) , resulting in setups that are only > stable as long as there are not too many large filter sets stores. > Leaving it to chance really. > > I would prefer the count-based limit to be deprecated for the > filterCache, or at least warned against, in favour of memory-based. > > - Toke Eskildsen, State and University Library, Denmark > >
Re: [Ext] Influence ranking based on document committed date
Try: recip(rord(creationDate),1,1000,1000) See: https://wiki.apache.org/solr/FunctionQuery You can play with the magic numbers to influence how this scales your docs. Best, Erick On Wed, Aug 17, 2016 at 7:11 AM, Jay Parasharwrote: > This is correct: " I index it and feed it the timestamp at index time". > You can sort desc on that field (can be a TrieDateField) > > > -Original Message- > From: Steven White [mailto:swhite4...@gmail.com] > Sent: Wednesday, August 17, 2016 9:01 AM > To: solr-user@lucene.apache.org > Subject: [Ext] Influence ranking based on document committed date > > Hi everyone > > Let's say I search for the word "Olympic" and I get a hit on 10 documents > that have similar content (let us assume the content is at least 80% > identical) how can I have Solr rank them so that the ones with most recently > updated doc gets ranked higher? Is this something I have to do at index time > or search time? > > Is the trick to have a field that holds the committed timestamp and boost on > that field during search? If so, is this field something I can configure in > Solr's schema.xml or must I index it and feed it the timestamp at index time? > If I'm on the right track, does this mean I have to always append this field > base boost to each query a user issues? > > If there is a wiki or article written on this topic, that would be a good > start. > > In case it matters, I'm using Solr 5.2 and my searches are utilizing edismax. > > Thanks in advanced! > > Steve
Re: Modified stat of index
thanks that works perfectly! Scott Original Message Subject: Re: Modified stat of index From: Alexandre RafalovitchTo: solr-user Date: 08/16/2016 04:17 PM I believe you can get that via Luke REST API: http://localhost:8983/solr//admin/luke Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 17 August 2016 at 07:18, Scott Derrick wrote: I need to retrieve the last modified timestamp of my search index. Is there a query I can use or is it stored in a particular file? thansk, Scott -- One man's "magic" is another man's engineering. "Supernatural" is a null word.” Robert A. Heinlein -- It is with our passions, as it is with fire and water, they are good servants but bad masters. Aesop
Re: Use function in condition
Hi Nabil, You can use frange queries, e.g. you can use fq={!frange l=100}sum(field1,field2) to filter doc with sum greater than 100. Regards, Emir On 17.08.2016 16:26, nabil Kouici wrote: Hi, Is it possible to use functions (function query https://cwiki.apache.org/confluence/display/solr/Function+Queries) in q or fq parameters to build a complex search expression. For exemple, take only documents that sum(field1,field2)> 100. Another exemple: if(test,value1,value2):vallue3 Regards,Nabil. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: index size increses dramatically
Hi It is quite normal that index size can be close to double during background merge of segments. If you have a lot of deletions and/or reindexed docs then the same document may also exist in multiple segments, taking up space temporarily until a merge or optimize. If this slows down your system then it sounds like your system is not sized properly wrt memory. But you need to provide more details for anyone to be able to tell you exactly what is going on in your situation. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 17. aug. 2016 kl. 15.59 skrev kshitij tyagi: > > Hi, > > > Suddenly my index size just doubles and indexing just slows down poorly. > > After sometime it reduces back to normal and indexing starts working. > > Can someone help me out in finding why index size doubles abnormally??
Re: Tagging and excluding Filters with BlockJoin Queries and BlockJoin Faceting
Hi Mikhail, thanks for the info ... what is the advantage of using the JSON FACET API compared to the standard BlockJoinQuery features? Is there already anybody working on the tagging/exclusion feature or is there any timeframe for it? There wasn't any discussion yet in SOLR-8998 about exclusions, was there? Thank you very much, best, Stefan Am 17.08.16 um 15:26 schrieb Mikhail Khludnev: Stefan, child.facet.field never intend to support exclusions. My preference is to implement it under json.facet that's discussed under https://issues.apache.org/jira/browse/SOLR-8998. On Wed, Aug 17, 2016 at 3:52 PM, Stefan Moiseswrote: Hey girls and guys, for a long time we have been using our own BlockJoin Implementation, because for our Shop Systems a lot of requirements that we had were not implemented in solr. As we now had a deeper look into how far the standard has come, we saw that BlockJoin and faceting on children is now part of the standard, which is pretty cool. When I tried to refactor our external code to use that now, I stumbled upon one non-working feature with BlockJoins that still keeps us from using it: It seems that tagging and excluding Filters with BlockJoin Faceting simply does not work yet. Simple query: =products ={!parent which='isparent:true'}shirt AND isparent:false =true ={!parent which='isparent:true'}{!tag=myTag}color:grey ={!ex=myTag}color Gives us: o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: undefined field: "{!ex=myTag}color" at org.apache.solr.schema.IndexSchema.getField(IndexSchema. java:1231) Does somebody have an idea? Best, Stefan -- -- Stefan Moises Manager Research & Development shoptimax GmbH Ulmenstraße 52 H 90443 Nürnberg Tel.: 0911/25566-0 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de Geschäftsführung: Friedrich Schreieck Ust.-IdNr.: DE 814340642 Amtsgericht Nürnberg HRB 21703 -- -- Stefan Moises Manager Research & Development shoptimax GmbH Ulmenstraße 52 H 90443 Nürnberg Tel.: 0911/25566-0 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de Geschäftsführung: Friedrich Schreieck Ust.-IdNr.: DE 814340642 Amtsgericht Nürnberg HRB 21703
Re: What does refCount denotes in solr admin
any update?? On Wed, Aug 17, 2016 at 12:47 PM, kshitij tyagiwrote: > Hi, > > I need to understand what is refcount in stats section of solr admin. > > I am seeing refcount: 2 on my solr cores and on one of the core i am > seeing refcount:171. > > The core with refcount with higher number is having very slow indexing > speed? > > >
Use function in condition
Hi, Is it possible to use functions (function query https://cwiki.apache.org/confluence/display/solr/Function+Queries) in q or fq parameters to build a complex search expression. For exemple, take only documents that sum(field1,field2)> 100. Another exemple: if(test,value1,value2):vallue3 Regards,Nabil.
RE: [Ext] Influence ranking based on document committed date
This is correct: " I index it and feed it the timestamp at index time". You can sort desc on that field (can be a TrieDateField) -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: Wednesday, August 17, 2016 9:01 AM To: solr-user@lucene.apache.org Subject: [Ext] Influence ranking based on document committed date Hi everyone Let's say I search for the word "Olympic" and I get a hit on 10 documents that have similar content (let us assume the content is at least 80% identical) how can I have Solr rank them so that the ones with most recently updated doc gets ranked higher? Is this something I have to do at index time or search time? Is the trick to have a field that holds the committed timestamp and boost on that field during search? If so, is this field something I can configure in Solr's schema.xml or must I index it and feed it the timestamp at index time? If I'm on the right track, does this mean I have to always append this field base boost to each query a user issues? If there is a wiki or article written on this topic, that would be a good start. In case it matters, I'm using Solr 5.2 and my searches are utilizing edismax. Thanks in advanced! Steve
Influence ranking based on document committed date
Hi everyone Let's say I search for the word "Olympic" and I get a hit on 10 documents that have similar content (let us assume the content is at least 80% identical) how can I have Solr rank them so that the ones with most recently updated doc gets ranked higher? Is this something I have to do at index time or search time? Is the trick to have a field that holds the committed timestamp and boost on that field during search? If so, is this field something I can configure in Solr's schema.xml or must I index it and feed it the timestamp at index time? If I'm on the right track, does this mean I have to always append this field base boost to each query a user issues? If there is a wiki or article written on this topic, that would be a good start. In case it matters, I'm using Solr 5.2 and my searches are utilizing edismax. Thanks in advanced! Steve
index size increses dramatically
Hi, Suddenly my index size just doubles and indexing just slows down poorly. After sometime it reduces back to normal and indexing starts working. Can someone help me out in finding why index size doubles abnormally??
Re: Tagging and excluding Filters with BlockJoin Queries and BlockJoin Faceting
Stefan, child.facet.field never intend to support exclusions. My preference is to implement it under json.facet that's discussed under https://issues.apache.org/jira/browse/SOLR-8998. On Wed, Aug 17, 2016 at 3:52 PM, Stefan Moiseswrote: > Hey girls and guys, > > for a long time we have been using our own BlockJoin Implementation, > because for our Shop Systems a lot of requirements that we had were not > implemented in solr. > > As we now had a deeper look into how far the standard has come, we saw > that BlockJoin and faceting on children is now part of the standard, which > is pretty cool. > When I tried to refactor our external code to use that now, I stumbled > upon one non-working feature with BlockJoins that still keeps us from using > it: > > It seems that tagging and excluding Filters with BlockJoin Faceting simply > does not work yet. > > Simple query: > > =products > ={!parent which='isparent:true'}shirt AND isparent:false > =true > ={!parent which='isparent:true'}{!tag=myTag}color:grey > ={!ex=myTag}color > > > Gives us: > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: > undefined field: "{!ex=myTag}color" > at org.apache.solr.schema.IndexSchema.getField(IndexSchema. > java:1231) > > > Does somebody have an idea? > > > Best, > Stefan > > -- > -- > > Stefan Moises > Manager Research & Development > shoptimax GmbH > Ulmenstraße 52 H > 90443 Nürnberg > Tel.: 0911/25566-0 > Fax: 0911/25566-29 > moi...@shoptimax.de > http://www.shoptimax.de > > Geschäftsführung: Friedrich Schreieck > Ust.-IdNr.: DE 814340642 > Amtsgericht Nürnberg HRB 21703 > > > -- Sincerely yours Mikhail Khludnev
Tagging and excluding Filters with BlockJoin Queries and BlockJoin Faceting
Hey girls and guys, for a long time we have been using our own BlockJoin Implementation, because for our Shop Systems a lot of requirements that we had were not implemented in solr. As we now had a deeper look into how far the standard has come, we saw that BlockJoin and faceting on children is now part of the standard, which is pretty cool. When I tried to refactor our external code to use that now, I stumbled upon one non-working feature with BlockJoins that still keeps us from using it: It seems that tagging and excluding Filters with BlockJoin Faceting simply does not work yet. Simple query: =products ={!parent which='isparent:true'}shirt AND isparent:false =true ={!parent which='isparent:true'}{!tag=myTag}color:grey ={!ex=myTag}color Gives us: o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: undefined field: "{!ex=myTag}color" at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1231) Does somebody have an idea? Best, Stefan -- -- Stefan Moises Manager Research & Development shoptimax GmbH Ulmenstraße 52 H 90443 Nürnberg Tel.: 0911/25566-0 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de Geschäftsführung: Friedrich Schreieck Ust.-IdNr.: DE 814340642 Amtsgericht Nürnberg HRB 21703
Re: Creating a SolrJ Data Service to send JSON to Solr
Thank you Alex and Anshum! I will look into both of these. Jennifer From: Anshum GuptaTo: solr-user@lucene.apache.org Date: 08/16/2016 08:17 PM Subject:Re: Creating a SolrJ Data Service to send JSON to Solr I would also suggest sending the JSON directly to the JSON end point, with the mapping : https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index +Handlers#UploadingDatawithIndexHandlers-JSONUpdateConveniencePaths On Tue, Aug 16, 2016 at 4:43 PM Alexandre Rafalovitch wrote: > Why do you need a POJO? For Solr purposes, you could just get the > field names from schema and use those to map directly from JSON to the > 'addField' calls in SolrDocument. > > Do you need it for non-Solr purposes? Then you can search for generic > Java dynamic POJO generation solution. > > Also, you could look at creating a superset rather than common-subset > POJO and then ignore all unknown fields on Solr side by adding a > dynamicField that matches '*' with everything (index, store, > docValues) set to false. > > Regards, >Alex. > > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 17 August 2016 at 02:49, Jennifer Coston > wrote: > > > > Hello, > > I am trying to write a data service using SolrJ that will allow me to > > accept JSON through a REST API, create a Solr document ,and write it to > > multiple different Solr cores (depending on the core name specified). The > > problem I am running into is that each core is going to have a different > > schema. My current code has the common fields between all the schemas in > a > > data POJO which I then walk and set the values specified in the JSON to > the > > Solr Document. However, I don’t want to create a different class for each > > schema to process the JSON and convert it to a Solr Document. Is there a > > way to process the extra JSON fields that are not common between the > > schemas and add them to the Solr Document, without knowing what they are > > ahead of time? Is there a way to convert JSON to a Solr Document without > > having to use a POJO? An alternative I was looking into is to use the > > SolrClient to get the schema fields, create a POJO, walk that POJO to > > create a Solr Document and then add it to Solr but, it doesn’t seem to be > > possible to obtain the fields this way. > > > > I know that the easiest way to add JSON to Solr would be to use a curl > > command and send the JSON directly to Solr but this doesn’t match our > > requirements, so I need to figure out a way to perform the same operation > > using SolrJ. Any other ideas or suggestions would be greatly appreciated! > > > > Thank you, > > > > -Jennifer >
Re: Increasing filterCache size and Java Heap size
On Wed, 2016-08-17 at 11:02 +0800, Zheng Lin Edwin Yeo wrote: > Would like to check, do I need to increase my Java Heap size for > Solr, if I plan to increase my filterCache size in solrconfig.xml? > > I'm using Solr 6.1.0 It _seems_ that you can specify a limit in megabytes when using LRUCache in Solr 5.2+: https://issues.apache.org/jira/browse/SOLR-7372 The documentation only mentions it for queryResultCache, but I do not know if that is intentional (i.e. it does not work for filterCache) or a shortcoming of the documentation: https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+Solr Config If it does work for filterCache too (using LRUCache, I guess), then that would be a much better way of limiting cache size than the highly insufficient count-based limiter. I say "highly insufficient" because filter cache entries are not of equal size. With small sets they are stored as sparse, using a relatively small amount of memory. For larger sets they are stored as bitmaps, taking up ~1K + maxdoc/8 bytes as Erick describes. So a fixed upper limit measured in counts needs to be adjusted to worst case, meaning maxdoc/8, to ensure stability. In reality most of the filter cache entries are small, meaning that there is plenty of heap not being used. This leads people to over-allocate the max size for the filterCache (very understandable) , resulting in setups that are only stable as long as there are not too many large filter sets stores. Leaving it to chance really. I would prefer the count-based limit to be deprecated for the filterCache, or at least warned against, in favour of memory-based. - Toke Eskildsen, State and University Library, Denmark
What does refCount denotes in solr admin
Hi, I need to understand what is refcount in stats section of solr admin. I am seeing refcount: 2 on my solr cores and on one of the core i am seeing refcount:171. The core with refcount with higher number is having very slow indexing speed?
RE: solr-6.1.0 - Using different client and server certificates for authentication doesn't work
This is what helped me: https://gist.github.com/jankronquist/6412839 -Original Message- From: Kostas [mailto:k...@dataverse.gr] Sent: Tuesday, July 26, 2016 3:22 PM To: solr-user@lucene.apache.org Subject: solr-6.1.0 - Using different client and server certificates for authentication doesn't work Hello. I have setup Solr 6.1.0 to use SSL (on Windows) and to do client authentication based on the client certificate. When I use the same certificate for both the server and the client authentication, everything works OK : == solr.in.cmd set SOLR_SSL_KEY_STORE=%ROO%/server/etc/solr-ssl.keystore.jks set SOLR_SSL_KEY_STORE_PASSWORD=password set SOLR_SSL_TRUST_STORE=%ROO%/server/etc/solr-ssl.keystore.jks set SOLR_SSL_TRUST_STORE_PASSWORD=password set SOLR_SSL_NEED_CLIENT_AUTH=true set SOLR_SSL_WANT_CLIENT_AUTH=false REM (Client settings residing below are commented out.) == server\etc\jetty-ssl.xml == This works : curl ^ --cert "solr-ssl.keystore.pem" ^ --cacert "solr-ssl.keystore.pem" ^ "https://localhost:8898/solr/admin/collections?action=CLUSTERSTATUS=json; indent=on" However, when I try to use different server and client certificates, it doesn't work (it seems that it still uses the server certificate for client authorizations) : == solr.in.cmd set SOLR_SSL_KEY_STORE=%ROO%/server/etc/solr-ssl.keystore.jks set SOLR_SSL_KEY_STORE_PASSWORD=password set SOLR_SSL_TRUST_STORE=%ROO%/server/etc/solr-ssl.keystore.jks set SOLR_SSL_TRUST_STORE_PASSWORD=password set SOLR_SSL_NEED_CLIENT_AUTH=true set SOLR_SSL_WANT_CLIENT_AUTH=false set SOLR_SSL_CLIENT_KEY_STORE=%ROO%/server/etc/solr-ssl-client.keystore.jks set SOLR_SSL_CLIENT_KEY_STORE_PASSWORD=password set SOLR_SSL_CLIENT_TRUST_STORE=%ROO%/server/etc/solr-ssl-client.keystore.jks set SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD=password == server\etc\jetty-ssl.xml == This fails (!!!): curl ^ --cert "solr-ssl-client.keystore.pem" ^ --cacert "solr-ssl.keystore.pem" ^ "https://localhost:8898/solr/admin/collections?action=CLUSTERSTATUS=json; indent=on" == This STILL works (!!!): curl ^ --cert "solr-ssl.keystore.pem" ^ --cacert "solr-ssl.keystore.pem" ^ "https://localhost:8898/solr/admin/collections?action=CLUSTERSTATUS=json; indent=on" I run Solr like this: "%ROO%\bin\solr" start -c -V -f -p 8898^ -Dsolr.ssl.checkPeerName=false >From what I can tell, Solr uses the values from ` server\etc\jetty-ssl.xml ` and totally discards the ones form `solr.in.cmd`. Naturally, I would try to set the client certificate inside there (jetty-ssl.xml), but I don't see any setting available for that. Is what I am trying to do (use different certificates for server and client authentication) supported or I waste my time? Also, why don't the docs say that jetty-ssl.xml overrides the settings in `solr.in.cmd`? Am I missing something? Thanks, Kostas
Re: Increasing filterCache size and Java Heap size
Hi Erick, Thanks for your reply. But do we have to set the Java Heap size based on all the collections available (if I were to increase the filterCache size for all my collections)? I come across this from StackOverFlow, http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem it says that if we want to have a filterCache size of 2000, we will need 12GB of memory. Let's say if we have 3 of the collections, which all the filterCache size are set to 2000, do we need 36GB of Java Heap space memory? Or just 12GB will be sufficient? Regards, Edwin On 17 August 2016 at 14:09, Erick Ericksonwrote: > Yes. Each entry is roughly 1K + maxdoc/8 bytes. The maxdoc/8 is the > bitmap that holds the result set and the 1K is just overhead for the > text of the query itself and cache overhead. Usually it's safe to > ignore since the maxdoc/8 usually dominates by a wide margin. > > Best, > Erick > > On Tue, Aug 16, 2016 at 8:02 PM, Zheng Lin Edwin Yeo > wrote: > > Hi, > > > > Would like to check, do I need to increase my Java Heap size for Solr, > if I > > plan to increase my filterCache size in solrconfig.xml? > > > > I'm using Solr 6.1.0 > > > > Regards, > > Edwin >
Re: Indexing (posting document) taking a lot of time
I am posting json using curl. On Wed, Aug 17, 2016 at 4:41 AM, Alexandre Rafalovitchwrote: > What format are those documents? Solr XML? Custom JSON? > > Or are you sending PDF/binary documents to Solr's extract handler and > asking it to do the extraction of the useful stuff? If later, you > could take that step out of Solr with a custom client using Tika (what > Solr has under the hood) and only send to Solr the processed output. > > Regards, >Alex. > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 16 August 2016 at 22:49, kshitij tyagi > wrote: > > 400kb is size of single document and i am sending 100 documents per > request. > > solr heap size is 16gb and running on multithread. > > > > On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic < > > emir.arnauto...@sematext.com> wrote: > > > >> Hi, > >> > >> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr > >> will be idle while accepting relatively large request. Or is 400KB 100 > doc > >> bulk that you are sending? > >> > >> What is Solr's heap size? I would try increasing number of threads and > >> monitor Solr's heap/CPU/IO to see where is the bottleneck. > >> > >> How complex is fields' analysis? > >> > >> Regards, > >> Emir > >> > >> > >> On 16.08.2016 13:25, kshitij tyagi wrote: > >> > >>> hi, > >>> > >>> we are sending about 100 documents per request for indexing? we have > >>> autocmmit set to false and commit only when 1 documents are > >>> present.solr and the machine sending request are in same pool. > >>> > >>> > >>> > >>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic < > >>> emir.arnauto...@sematext.com> wrote: > >>> > >>> Hi, > > Do you send one doc per request? How frequently do you commit? Where > is > Solr running? What is network connection between your machine and > Solr? > What are JVM settings? Is 10-30s for entire indexing or single doc? > > Regards, > Emir > > > On 16.08.2016 11:34, kshitij tyagi wrote: > > Hi alexandre, > > > > 1 document of 400kb size is taking approx 10-30 sec and this is > > varying. I > > am posting document using curl > > > > On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch < > > arafa...@gmail.com> > > wrote: > > > > How many records is that and what is 'slow'? Also is this standalone > or > > > >> cluster setup? > >> > >> On 16 Aug 2016 6:33 PM, "kshitij tyagi" < > kshitij.shopcl...@gmail.com> > >> wrote: > >> > >> Hi, > >> > >>> I am indexing a lot of data about 8GB, but it is taking a lot of > >>> time. I > >>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy > ,etc in > >>> solrconfig file. > >>> > >>> It would be helpful if someone could help me out tune the segtting > for > >>> faster indexing speeds. > >>> > >>> *I have read the docs but not able to get what exactly means > changing > >>> > >>> these > >> > >> configs.* > >>> > >>> > >>> *Regards,* > >>> *Kshitij* > >>> > >>> > >>> -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > > >> -- > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > >> Solr & Elasticsearch Support * http://sematext.com/ > >> > >> >
Re: Multiple rollups/facets in one streaming aggregation?
Thanks a lot, Joel, for your very fast and informative reply! We'll chew on this and add a Jira if we're going on this route. -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Tue, Aug 16, 2016 at 8:29 PM, Joel Bernsteinwrote: > For the initial implementation we could skip the merge piece if that helps > get things done faster. In this scenario the metrics could be gathered > after some parallel operation, then there would be no need for a merge. > Sample syntax: > > metrics(parallel(join()) > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Aug 16, 2016 at 1:25 PM, Joel Bernstein wrote: > >> The concept of a MetricStream was in the early designs but hasn't yet been >> implemented. Now might be a good time to work on the implementation. >> >> The MetricStream wraps a stream and gathers metrics in memory, continuing >> to emit the tuples from the underlying stream. This allows multiple >> MetricStreams to operate over the same stream without transforming the >> stream. Psuedo code for a metric expression syntax is below: >> >> metrics(metrics(search()) >> >> The MetricStream delivers it's metrics through the EOF Tuple. So the >> MetricStream simply adds the finished aggregations to the EOF Tuple and >> returns it. If we're going to support parallel metric gathering then we'll >> also need to support the merging of the metrics. Something like this: >> >> metrics(parallel(metrics(join()) >> >> Where the metrics wrapping the parallel function would need to collect the >> EOF tuples from each worker and the merge the metrics and then emit the >> merged metrics in and EOF Tuple. >> >> If you think this meets your needs, feel free to create a jira and add >> begin a patch and I can help get it committed. >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Tue, Aug 16, 2016 at 11:52 AM, Radu Gheorghe < >> radu.gheor...@sematext.com> wrote: >> >>> Hello Solr users :) >>> >>> Right now it seems that if I want to rollup on two different fields >>> with streaming expressions, I would need to do two separate requests. >>> This is too slow for our use-case, when we need to do joins before >>> sorting and rolling up (because we'd have to re-do the joins). >>> >>> Since in our case we are actually looking for some not-necessarily >>> accurate facets (top N), the best solution we could come up with was >>> to implement a new stream decorator that implements an algorithm like >>> Count-min sketch[1] which would run on the tuples provided by the >>> stream function it wraps. This would have two big wins for us: >>> 1) it would do the facet without needing to sort on the facet field, >>> so we'll potentially save lots of memory >>> 2) because sorting isn't needed, we could do multiple facets in one go >>> >>> That said, I have two (broad) questions: >>> A) is there a better way of doing this? Let's reduce the problem to >>> streaming aggregations, where the assumption is that we have multiple >>> collections where data needs to be joined, and then facet on fields >>> from all collections. But maybe there's a better algorithm, something >>> out of the box or closer to what is offered out of the box? >>> B) whatever the best way is, could we do it in a way that can be >>> contributed back to Solr? Any hints on how to do that? Just another >>> decorator? >>> >>> Thanks and best regards, >>> Radu >>> -- >>> Performance Monitoring * Log Analytics * Search Analytics >>> Solr & Elasticsearch Support * http://sematext.com/ >>> >>> [1] https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch >>> >> >>
Re: Increasing filterCache size and Java Heap size
Yes. Each entry is roughly 1K + maxdoc/8 bytes. The maxdoc/8 is the bitmap that holds the result set and the 1K is just overhead for the text of the query itself and cache overhead. Usually it's safe to ignore since the maxdoc/8 usually dominates by a wide margin. Best, Erick On Tue, Aug 16, 2016 at 8:02 PM, Zheng Lin Edwin Yeowrote: > Hi, > > Would like to check, do I need to increase my Java Heap size for Solr, if I > plan to increase my filterCache size in solrconfig.xml? > > I'm using Solr 6.1.0 > > Regards, > Edwin