[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-12-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-7734:

Fix Version/s: (was: 5.4)
   5.5

> MapReduce Indexer can error when using collection
> -
>
> Key: SOLR-7734
> URL: https://issues.apache.org/jira/browse/SOLR-7734
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - MapReduce
>Affects Versions: 5.2.1
>Reporter: Mike Drob
>Assignee: Gregory Chanan
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, 
> SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
> SOLR-7734.patch, SOLR-7734.patch
>
>
> When running the MapReduceIndexerTool, it will usually pull a 
> {{solrconfig.xml}} from ZK for the collection that it is running against. 
> This can be problematic for several reasons:
> * Performance: The configuration in ZK will likely have several query 
> handlers, and lots of other components that don't make sense in an 
> indexing-only use of EmbeddedSolrServer (ESS).
> * Classpath Resources: If the Solr services are using some kind of additional 
> service (such as Sentry for auth) then the indexer will not have access to 
> the necessary configurations without the user jumping through several hoops.
> * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
> sense. There's other configurations that 
> * Update Chain Behaviours: I'm under the impression that UpdateChains may 
> behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
> consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-28 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Attaching a patch based on latest trunk, since there were some conflicts that 
came up since my last submission.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-21 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.branch5x.patch

Attached is an addendum patch for branch_5x to be applied on top of the 
original patch.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-18 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Attaching a new patch that addresses the issues. I was able to reduce the 
number of conf files a bit too.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-14 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Fix Version/s: (was: 5.3)
   5.4

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-14 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Attached is a patch that adds additional tests for the new solrconfig.xml 
behaviour in map-reduce. As part of this, I refactored one of the tests to 
reduce duplication.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-07-15 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Updated patch:
* Fixed imports.
* Used try-with-resources for streams.
* Fixed file location.
* Updated Lucene match version.
* Removed request dispatcher section

I don't have a good answer for what to do with the docs. Trying to enumerate 
all the possibilities - use embedded solrconfig, use solrconfig from zk, use 
solrconfig from solr home dir, got really messy, so I tried to hedge. I'm now 
leaning toward it being worth the additional complexity in documentation to 
spell everything out explicitly but don't have a good handle on it.

I'm not sure about working with managed schemas. I can remove the comment to 
prevent confusion - it was originally copied from another example file.

JMX on MapReduce tasks can be enabled through {{mapreduce.map.java.opts}} - 
I've seen it used for attempting to monitor memory usage. If somebody wants to 
try to do this, then I don't intend to give them more hoops to jump through. If 
jmx is disabled, then I think we end up ignoring that directive, so it's fine.

I will look at how we can add additional tests to verify everything.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-07-08 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Updated patch to have a (help-suppressed) flag that allows old behaviour. I 
don't think this is a big issue since the contrib is documented as 
experimental, but I've added it regardless.

Can any committers take a look at this?

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-06-30 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

New patch that moves the {{System.setProperty}} calls out of 
{{SolrRecordWriter}}, since we explicitly control the configuration now. Also, 
disabled nrt cache and block cache, since there is a single write and no reads 
in the MR job.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7734) MapReduce Indexer can error when using collection

2015-06-29 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-7734:

Attachment: SOLR-7734.patch

Attached a patch that adds a new (embedded) solrconfig.xml file to the 
map-reduce contrib module. This config will be loaded in lieu of the one found 
in ZK for the collection. The embedded config features a minimal operational 
footprint, disabling most request handlers, update chains, and soft commits. It 
can be overridden by explicitly specifying a --solr-home-dir argument on the 
command line when launching the job.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org