wolfgang hoschek created SOLR-5786:
--------------------------------------

             Summary: MapReduceIndexerTool --help text is missing large parts 
of the help text
                 Key: SOLR-5786
                 URL: https://issues.apache.org/jira/browse/SOLR-5786
             Project: Solr
          Issue Type: Bug
          Components: contrib - MapReduce
    Affects Versions: 4.7
            Reporter: wolfgang hoschek
            Assignee: Mark Miller
             Fix For: 4.8


As already mentioned repeatedly and at length, this is a regression introduced 
by the fix in https://issues.apache.org/jira/browse/SOLR-5605

Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:

{code}
130,235c130
<                          lucene  segments  left  in   this  index.  Merging
<                          segments involves reading  and  rewriting all data
<                          in all these  segment  files, potentially multiple
<                          times,  which  is  very  I/O  intensive  and  time
<                          consuming. However, an  index  with fewer segments
<                          can later be merged  faster,  and  it can later be
<                          queried  faster  once  deployed  to  a  live  Solr
<                          serving shard. Set  maxSegments  to  1 to optimize
<                          the index for low query  latency. In a nutshell, a
<                          small maxSegments  value  trades  indexing latency
<                          for subsequently improved query  latency. This can
<                          be  a  reasonable  trade-off  for  batch  indexing
<                          systems. (default: 1)
<   --fair-scheduler-pool STRING
<                          Optional tuning knob  that  indicates  the name of
<                          the fair scheduler  pool  to  submit  jobs to. The
<                          Fair Scheduler is a  pluggable MapReduce scheduler
<                          that provides a way to  share large clusters. Fair
<                          scheduling is a method  of  assigning resources to
<                          jobs such that all jobs  get, on average, an equal
<                          share of resources  over  time.  When  there  is a
<                          single job  running,  that  job  uses  the  entire
<                          cluster. When  other  jobs  are  submitted,  tasks
<                          slots that free up are  assigned  to the new jobs,
<                          so that each job gets  roughly  the same amount of
<                          CPU time.  Unlike  the  default  Hadoop scheduler,
<                          which forms a queue of  jobs, this lets short jobs
<                          finish in reasonable time  while not starving long
<                          jobs. It is also an  easy  way  to share a cluster
<                          between multiple of users.  Fair  sharing can also
<                          work with  job  priorities  -  the  priorities are
<                          used as  weights  to  determine  the  fraction  of
<                          total compute time that each job gets.
<   --dry-run              Run in local mode  and  print  documents to stdout
<                          instead of loading them  into  Solr. This executes
<                          the  morphline  in  the  client  process  (without
<                          submitting a job  to  MR)  for  quicker turnaround
<                          during early  trial  &  debug  sessions. (default:
<                          false)
<   --log4j FILE           Relative or absolute  path  to  a log4j.properties
<                          config file on the  local  file  system. This file
<                          will  be  uploaded  to   each  MR  task.  Example:
<                          /path/to/log4j.properties
<   --verbose, -v          Turn on verbose output. (default: false)
<   --show-non-solr-cloud  Also show options for  Non-SolrCloud  mode as part
<                          of --help. (default: false)
< 
< Required arguments:
<   --output-dir HDFS_URI  HDFS directory to  write  Solr  indexes to. Inside
<                          there one  output  directory  per  shard  will  be
<                          generated.    Example:     hdfs://c2202.mycompany.
<                          com/user/$USER/test
<   --morphline-file FILE  Relative or absolute path  to  a local config file
<                          that contains one  or  more  morphlines.  The file
<                          must     be      UTF-8      encoded.      Example:
<                          /path/to/morphline.conf
< 
< Cluster arguments:
<   Arguments that provide information about your Solr cluster. 
< 
<   --zk-host STRING       The address of a ZooKeeper  ensemble being used by
<                          a SolrCloud cluster. This  ZooKeeper ensemble will
<                          be examined  to  determine  the  number  of output
<                          shards to create  as  well  as  the  Solr  URLs to
<                          merge the output shards into  when using the --go-
<                          live option. Requires that  you  also  pass the --
<                          collection to merge the shards into.
<                          
<                          The   --zk-host   option   implements   the   same
<                          partitioning semantics as  the  standard SolrCloud
<                          Near-Real-Time (NRT)  API.  This  enables  to  mix
<                          batch  updates  from   MapReduce   ingestion  with
<                          updates from standard  Solr  NRT  ingestion on the
<                          same SolrCloud  cluster,  using  identical  unique
<                          document keys.
<                          
<                          Format is: a  list  of  comma  separated host:port
<                          pairs,  each  corresponding   to   a   zk  server.
<                          Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
<                          2183' If the optional  chroot  suffix  is used the
<                          example  would  look  like:  '127.0.0.1:2181/solr,
<                          127.0.0.1:2182/solr,127.0.0.1:2183/solr'     where
<                          the client would  be  rooted  at  '/solr'  and all
<                          paths would  be  relative  to  this  root  -  i.e.
<                          getting/setting/etc... '/foo/bar' would  result in
<                          operations being run on  '/solr/foo/bar' (from the
<                          server perspective).
<                          
< 
< Go live arguments:
<   Arguments for  merging  the  shards  that  are  built  into  a  live Solr
<   cluster. Also see the Cluster arguments.
< 
<   --go-live              Allows you to  optionally  merge  the  final index
<                          shards into a  live  Solr  cluster  after they are
<                          built. You can pass the  ZooKeeper address with --
<                          zk-host and the relevant  cluster information will
<                          be auto detected.  (default: false)
<   --collection STRING    The SolrCloud  collection  to  merge  shards  into
<                          when  using  --go-live   and  --zk-host.  Example:
<                          collection1
<   --go-live-threads INTEGER
<                          Tuning knob that indicates  the  maximum number of
<                          live merges  to  run  in  parallel  at  one  time.
<                          (default: 1000)
< 
---
>       
{code}

As already mentioned repeatedly and at length, the fix is to to apply CDH-16434 
to MapReduceIndexerTool.java because there's a change related to buffer 
flushing in argparse4 >= 0.4.2:

{code}
-            parser.printHelp(new PrintWriter(System.out));  
+            parser.printHelp();
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to