[ https://issues.apache.org/jira/browse/SOLR-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller resolved SOLR-5786. ------------------------------- Resolution: Duplicate > MapReduceIndexerTool --help output is missing large parts of the help text > -------------------------------------------------------------------------- > > Key: SOLR-5786 > URL: https://issues.apache.org/jira/browse/SOLR-5786 > Project: Solr > Issue Type: Bug > Components: contrib - MapReduce > Affects Versions: 4.7 > Reporter: wolfgang hoschek > Assignee: Mark Miller > Fix For: 4.8 > > > As already mentioned repeatedly and at length, this is a regression > introduced by the fix in https://issues.apache.org/jira/browse/SOLR-5605 > Here is the diff of --help output before SOLR-5605 vs after SOLR-5605: > {code} > 130,235c130 > < lucene segments left in this index. Merging > < segments involves reading and rewriting all data > < in all these segment files, potentially multiple > < times, which is very I/O intensive and time > < consuming. However, an index with fewer segments > < can later be merged faster, and it can later be > < queried faster once deployed to a live Solr > < serving shard. Set maxSegments to 1 to optimize > < the index for low query latency. In a nutshell, a > < small maxSegments value trades indexing latency > < for subsequently improved query latency. This can > < be a reasonable trade-off for batch indexing > < systems. (default: 1) > < --fair-scheduler-pool STRING > < Optional tuning knob that indicates the name of > < the fair scheduler pool to submit jobs to. The > < Fair Scheduler is a pluggable MapReduce scheduler > < that provides a way to share large clusters. Fair > < scheduling is a method of assigning resources to > < jobs such that all jobs get, on average, an equal > < share of resources over time. When there is a > < single job running, that job uses the entire > < cluster. When other jobs are submitted, tasks > < slots that free up are assigned to the new jobs, > < so that each job gets roughly the same amount of > < CPU time. Unlike the default Hadoop scheduler, > < which forms a queue of jobs, this lets short jobs > < finish in reasonable time while not starving long > < jobs. It is also an easy way to share a cluster > < between multiple of users. Fair sharing can also > < work with job priorities - the priorities are > < used as weights to determine the fraction of > < total compute time that each job gets. > < --dry-run Run in local mode and print documents to stdout > < instead of loading them into Solr. This executes > < the morphline in the client process (without > < submitting a job to MR) for quicker turnaround > < during early trial & debug sessions. (default: > < false) > < --log4j FILE Relative or absolute path to a log4j.properties > < config file on the local file system. This file > < will be uploaded to each MR task. Example: > < /path/to/log4j.properties > < --verbose, -v Turn on verbose output. (default: false) > < --show-non-solr-cloud Also show options for Non-SolrCloud mode as part > < of --help. (default: false) > < > < Required arguments: > < --output-dir HDFS_URI HDFS directory to write Solr indexes to. Inside > < there one output directory per shard will be > < generated. Example: hdfs://c2202.mycompany. > < com/user/$USER/test > < --morphline-file FILE Relative or absolute path to a local config file > < that contains one or more morphlines. The file > < must be UTF-8 encoded. Example: > < /path/to/morphline.conf > < > < Cluster arguments: > < Arguments that provide information about your Solr cluster. > < > < --zk-host STRING The address of a ZooKeeper ensemble being used by > < a SolrCloud cluster. This ZooKeeper ensemble will > < be examined to determine the number of output > < shards to create as well as the Solr URLs to > < merge the output shards into when using the --go- > < live option. Requires that you also pass the -- > < collection to merge the shards into. > < > < The --zk-host option implements the same > < partitioning semantics as the standard SolrCloud > < Near-Real-Time (NRT) API. This enables to mix > < batch updates from MapReduce ingestion with > < updates from standard Solr NRT ingestion on the > < same SolrCloud cluster, using identical unique > < document keys. > < > < Format is: a list of comma separated host:port > < pairs, each corresponding to a zk server. > < Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1: > < 2183' If the optional chroot suffix is used the > < example would look like: '127.0.0.1:2181/solr, > < 127.0.0.1:2182/solr,127.0.0.1:2183/solr' where > < the client would be rooted at '/solr' and all > < paths would be relative to this root - i.e. > < getting/setting/etc... '/foo/bar' would result in > < operations being run on '/solr/foo/bar' (from the > < server perspective). > < > < > < Go live arguments: > < Arguments for merging the shards that are built into a live Solr > < cluster. Also see the Cluster arguments. > < > < --go-live Allows you to optionally merge the final index > < shards into a live Solr cluster after they are > < built. You can pass the ZooKeeper address with -- > < zk-host and the relevant cluster information will > < be auto detected. (default: false) > < --collection STRING The SolrCloud collection to merge shards into > < when using --go-live and --zk-host. Example: > < collection1 > < --go-live-threads INTEGER > < Tuning knob that indicates the maximum number of > < live merges to run in parallel at one time. > < (default: 1000) > < > --- > > > {code} > As already mentioned repeatedly and at length, this bug is because there's a > change related to buffer flushing in argparse4 >= 0.4.2. > The fix is to apply CDH-16434 to MapReduceIndexerTool.java as follows: > {code} > - parser.printHelp(new PrintWriter(System.out)); > + parser.printHelp(); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org