FW: Is it valid to use userLimit in calculating maxApplicationsPerUser ?

2015-07-16 Thread Naganarasimha G R (Naga)
Hi Folks ,
Came across one scenario where in maxApplications @ cluster level(2 node) was 
set to a low value like 10 and based on capacity configuration for a particular 
queue it was coming to 2 as value, but further while calculating 
maxApplicationsPerUser formula used is :

maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
userLimitFactor);

but the definition of userLimit  in the documentation is :
Each queue enforces a limit on the percentage of resources allocated to a user 
at any given time, if there is demand for resources. The user limit can vary 
between a minimum and maximum value. The the former (the minimum value) is set 
to this property value and the latter (the maximum value) depends on the number 
of users who have submitted applications. For e.g., suppose the value of this 
property is 25. If two users have submitted applications to a queue, no single 
user can use more than 50% of the queue resources. If a third user submits an 
application, no single user can use more than 33% of the queue resources. With 
4 or more users, no user can use more than 25% of the queues resources. A value 
of 100 implies no user limits are imposed. The default is 100. Value is 
specified as a integer.

So was wondering how a minimum limit is made used in a formula to calculate max 
applications for a user, suppose i set 
yarn.scheduler.capacity.queue-path.minimum-user-limit-percent to 20  
assuming at least 20% of queue at the minimum is available for a queue but 
based on the formula maxApplicationsPerUser  is getting set to zero.
According to the definition of the property max is based on the current no. of 
active users, so i feel this formula is wrong.
P.S. userLimitFactor was configured as default 1, but what i am wondering is 
whether its valid to use it in combination with userlimit to find max apps per 
user.

Please correct me if my understanding is wrong? if its a bug would like to 
raise and solve it .

+ Naga


Fwd: Properties file not loaded with hadoop jar command

2015-07-16 Thread Nishanth S
Hello,
I have built a  jar file with maven as build tool which  reads properties
from a  file.I am doing this  like  InputStream is =
ClassLoader.getSystemResourceAsStream((hadoop.properties)) on start
iup.No qonce the jar is built  I could see that the properties get loaded
when  I do java -jar but when I am trying to execute the map reduce with
 hadoop jar  ClassLoader.getSystemResourceAsStream returns input stream
null.I want to keep this property file  within the jar and is available  in
the classpath.Please  advise  if any one has met this scenario before.


-Chinchu


Re: sqoop export query error out..

2015-07-16 Thread James Bond
What is the format of the data you are trying to export? The hive table -
is it in textt/parquet or something else?

On Fri, Jul 17, 2015 at 12:25 AM, Kumar Jayapal kjayapa...@gmail.com
wrote:

 Hi James,

 I tried with uppercase and it worked now I am getting different error. I
 don't see the .metadata dir in the path.

 15/07/16 18:49:12 ERROR sqoop.Sqoop: Got exception running Sqoop:
 org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
 exist:
 hdfs://name/user/hive/warehouse/testing_bi.db/testing_dimension/.metadata
 org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
 exist:
 hdfs://name/user/hive/warehouse/testing_bi.db/testing_dimension/.metadata
 at
 org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:527)
 at
 org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:570)
 at
 org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:112)
 at
 org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:228)
 at org.kitesdk.data.Datasets.load(Datasets.java:69)
 at org.kitesdk.data.Datasets.load(Datasets.java:113)
 at
 org.kitesdk.data.mapreduce.DatasetKeyInputFormat.load(DatasetKeyInputFormat.java:274)
 at
 org.kitesdk.data.mapreduce.DatasetKeyInputFormat.setConf(DatasetKeyInputFormat.java:214)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:586)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:606)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:490)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
 at
 org.apache.sqoop.mapreduce.ExportJobBase.doSubmitJob(ExportJobBase.java:302)
 at
 org.apache.sqoop.mapreduce.ExportJobBase.runJob(ExportJobBase.java:279)
 at
 org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:420)
 at
 org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:455)
 at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
 at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
 at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
 at org.apache.sqoop.Sqoop.main(Sqoop.java:236)









 Thanks
 Jay

 On Wed, Jul 15, 2015 at 9:12 PM, James Bond bond.b...@gmail.com wrote:

 This clearly says its an oracle error. Try checking for the following -
 1. If the user TESTUSER has Read/write privileges.
 2. If the table TEST_DIMESION is the same schema as TESTUSER, if not try
 appending the schema name to the table - schema/owner.TEST_DIMENSION.
 3. Make sure you are connecting to the right server and service name.

 Thanks,
 Ashwin


 On Thu, Jul 16, 2015 at 5:05 AM, Kumar Jayapal kjayapa...@gmail.com
 wrote:

 Hi,

 I see the table in my database. When I execute a export command I get
 table or view does not exist. When I do list-tables I can see the table.

 Can some one tell me what wrong. I have DBA privileges

 sqoop export --connect jdbc:oracle:thin:@lorsasa.ss.com:1521/pocd01us
  --username  TESTUSER --P --table TEST_DIMENSION --export-dir
 /user/hive/warehouse/test.db/test_dimension/
 Warning:
 /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p654.326/bin/../lib/sqoop/../accumulo
 does not exist! Accumulo imports will fail.
 Please set $ACCUMULO_HOME to the root of your Accumulo installation.
 15/07/15 23:20:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.2
 Enter password:
 15/07/15 23:20:20 INFO oracle.OraOopManagerFactory: Data Connector for
 Oracle and Hadoop is disabled.
 15/07/15 23:20:20 INFO manager.SqlManager: Using default fetchSize of
 1000
 15/07/15 23:20:20 INFO tool.CodeGenTool: Beginning code generation
 15/07/15 23:20:21 INFO manager.OracleManager: Time zone has been set to
 GMT
 15/07/15 23:20:21 INFO manager.SqlManager: Executing SQL statement:
 SELECT t.* FROM TEST_DIMENSION t WHERE 1=0
 15/07/15 23:20:21 ERROR 

Re: How to convert files in HIVE

2015-07-16 Thread Ted Yu
Looks like you may get more help from Hive mailing list:

http://hive.apache.org/mailing_lists.html

FYI

On Thu, Jul 16, 2015 at 4:15 PM, Kumar Jayapal kjayapa...@gmail.com wrote:

 Hi,

 How can we convert files stored in snappy compressed parquet format in
 Hive to avro format.

 is it possible to do it. Can you please guide me or give me the link which
 describe the procdure


 Thanks
 Jay



Re: FW: Is it valid to use userLimit in calculating maxApplicationsPerUser ?

2015-07-16 Thread Wangda Tan
I think it's not valid. Multiply it by ULF seems not reasonable, I think it
should be:

max(1, maxApplications * max(userlimit/100, 1/#activeUsers))

Assuming admin setups a very large ULF (e.g. 100), maxApplicationsPerUser
can be much more than maxApplications of a queue.

Also, multiply ULF to compute userAMResourceLimit seems not valid too, it
can lead to a single user can run too much applications than expected,
which we should avoid.

Thoughts?

Thanks,
Wangda

On Thu, Jul 16, 2015 at 12:53 AM, Naganarasimha G R (Naga) 
garlanaganarasi...@huawei.com wrote:

   Hi Folks ,
  Came across one scenario where in maxApplications @ cluster level(2
 node) was set to a low value like 10 and based on capacity configuration
 for a particular queue it was coming to 2 as value, but further while
 calculating maxApplicationsPerUser formula used is :

  *maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) *
 userLimitFactor);*

  but the definition of *userLimit  *in the documentation is :
 *Each queue enforces a limit on the percentage of resources allocated to a
 user at any given time, if there is demand for resources. The user limit
 can vary between a minimum and maximum value. The the former (the minimum
 value) is set to this property value and the latter (the maximum value)
 depends on the number of users who have submitted applications. For e.g.,
 suppose the value of this property is 25. If two users have submitted
 applications to a queue, no single user can use more than 50% of the queue
 resources. If a third user submits an application, no single user can use
 more than 33% of the queue resources. With 4 or more users, no user can use
 more than 25% of the queues resources. A value of 100 implies no user
 limits are imposed. The default is 100. Value is specified as a integer.*

  So was wondering how a* minimum limit is made used in a formula to
 calculate max applications for a user*, suppose i set 
 yarn.scheduler.capacity.queue-path.minimum-user-limit-percent to *20 * 
 assuming
 at least 20% of queue at the minimum is available for a queue but based on
 the formula *maxApplicationsPerUser * is getting set to *zero. *
 According to the definition of the property max is based on the current
 no. of active users, so i feel this formula is wrong.
 *P.S. *userLimitFactor was configured as default 1, but what i am
 wondering is whether its valid to use it in combination with userlimit to
 find max apps per user.

  Please correct me if my understanding is wrong? if its a bug would like
 to raise and solve it .

  + Naga