[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-7734: Fix Version/s: (was: 5.4) 5.5 > MapReduce Indexer can error when using collection > - > > Key: SOLR-7734 > URL: https://issues.apache.org/jira/browse/SOLR-7734 > Project: Solr > Issue Type: Bug > Components: contrib - MapReduce >Affects Versions: 5.2.1 >Reporter: Mike Drob >Assignee: Gregory Chanan > Fix For: 5.5, Trunk > > Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, > SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, > SOLR-7734.patch, SOLR-7734.patch > > > When running the MapReduceIndexerTool, it will usually pull a > {{solrconfig.xml}} from ZK for the collection that it is running against. > This can be problematic for several reasons: > * Performance: The configuration in ZK will likely have several query > handlers, and lots of other components that don't make sense in an > indexing-only use of EmbeddedSolrServer (ESS). > * Classpath Resources: If the Solr services are using some kind of additional > service (such as Sentry for auth) then the indexer will not have access to > the necessary configurations without the user jumping through several hoops. > * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make > sense. There's other configurations that > * Update Chain Behaviours: I'm under the impression that UpdateChains may > behave differently in ESS than a SolrCloud cluster. Is it safe to depend on > consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Attaching a patch based on latest trunk, since there were some conflicts that came up since my last submission. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.branch5x.patch Attached is an addendum patch for branch_5x to be applied on top of the original patch. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Attaching a new patch that addresses the issues. I was able to reduce the number of conf files a bit too. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Fix Version/s: (was: 5.3) 5.4 MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Attached is a patch that adds additional tests for the new solrconfig.xml behaviour in map-reduce. As part of this, I refactored one of the tests to reduce duplication. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: 5.3, Trunk Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Updated patch: * Fixed imports. * Used try-with-resources for streams. * Fixed file location. * Updated Lucene match version. * Removed request dispatcher section I don't have a good answer for what to do with the docs. Trying to enumerate all the possibilities - use embedded solrconfig, use solrconfig from zk, use solrconfig from solr home dir, got really messy, so I tried to hedge. I'm now leaning toward it being worth the additional complexity in documentation to spell everything out explicitly but don't have a good handle on it. I'm not sure about working with managed schemas. I can remove the comment to prevent confusion - it was originally copied from another example file. JMX on MapReduce tasks can be enabled through {{mapreduce.map.java.opts}} - I've seen it used for attempting to monitor memory usage. If somebody wants to try to do this, then I don't intend to give them more hoops to jump through. If jmx is disabled, then I think we end up ignoring that directive, so it's fine. I will look at how we can add additional tests to verify everything. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: 5.3, Trunk Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Updated patch to have a (help-suppressed) flag that allows old behaviour. I don't think this is a big issue since the contrib is documented as experimental, but I've added it regardless. Can any committers take a look at this? MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Fix For: 5.3, Trunk Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch New patch that moves the {{System.setProperty}} calls out of {{SolrRecordWriter}}, since we explicitly control the configuration now. Also, disabled nrt cache and block cache, since there is a single write and no reads in the MR job. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Fix For: 5.3, Trunk Attachments: SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-7734: Attachment: SOLR-7734.patch Attached a patch that adds a new (embedded) solrconfig.xml file to the map-reduce contrib module. This config will be loaded in lieu of the one found in ZK for the collection. The embedded config features a minimal operational footprint, disabling most request handlers, update chains, and soft commits. It can be overridden by explicitly specifying a --solr-home-dir argument on the command line when launching the job. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Fix For: 5.3, Trunk Attachments: SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org