[ https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135803#comment-17135803 ]
Patrick Mézard commented on NUTCH-2792: --------------------------------------- In [3], I would prefix with the indexwriter *identifier*. To override the default csv indexwriter outpath, I would: {code:java} -param indexer_csv_1.outpath=/some/path {code} But what is the difference between dynamic parameters and other ones? You suggested to reuse the properties syntax here, why not reuse the properties entirely? {code:java} -Dindexwriters.indexer_csv_1.outpath=/some/path {code} The behaviour would be roughly: * You can override any indexwriters property that way from command line * You can define them in nutch-site.xml if you wish (but there is no strong reason to advertise this imho). * Properties in index-writers.xml are implicitely mapped to "-Dindexwriters.$writer_id.$property" The only thing I am not completely happy with is the overriding order. My gut feeling would have been: - "Command-line property > index-writers.xml > nutch-site.xml But I suspect the properties are not handled by the command itself but by hadoop via Tool, or something else. So we cannot tell "Command-line property" from "nutch-site.xml", and the behaviour would be: * "Command-line property" > nutch-site.xml > index-writers.xml Probably not a big deal in practice, just a little weird since nutch-site.xml defines the location of index-writers.xml, hence feel more *global*. The implementation does not look too crazy either. At the end of "IndexWriters.loadWritersConfiguration", just iterate on "indexwriters.*" keys from the global configuration. For each key: * Extract the writer id prefix. If it is not in the IndexWriterConfig, fail (or at least log an error). * Add all Configuration keys for this writer in IndexWriterConfig, overwriting existing ones. Once this is done, deprecate "nutch index -params". What do you think? > nutch index -params is only used in Solr indexer > ------------------------------------------------ > > Key: NUTCH-2792 > URL: https://issues.apache.org/jira/browse/NUTCH-2792 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.17 > Reporter: Patrick Mézard > Priority: Minor > Fix For: 1.18 > > > `nutch index` help displays: > {code:java} > General options: > ... > -params k1=v1&k2=v2... parameters passed to indexer plugins > (via property indexer.additional.params){code} > The option does nothing when used with CSV or dummy indexers. Looking at the > code, the property is defined in: > [https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L78] > which is only used in: > [https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java#L141] > Several possibilities: > * Drop the parameter from the help. Does not break backward compatibility. > * Move the -params handling in IndexWriters.java and add them to > IndexWriterParams of every indexer. Not too impactful but not super clean > either: the parameters are not "namespaced" per indexer, if someone uses > multiple indexers there may be parameter collisions. > * Refactor the way these parameters are passed, to prefix them with target > indexer. Would break backward compatibility. In that case, it would be good > to change the format completely: turn -params into -param, allow multiple > values to be passed and forget the '=/&' syntax (which does not handle > escaping anyway). > Not sure how much this parameter is used. I would have used it to configure > the output path for indexer-csv or indexer-dummy. -- This message was sent by Atlassian Jira (v8.3.4#803005)