[jira] [Created] (HADOOP-19236) Integration of Volcano Engine TOS in Hadoop.
Jinglun created HADOOP-19236: Summary: Integration of Volcano Engine TOS in Hadoop. Key: HADOOP-19236 URL: https://issues.apache.org/jira/browse/HADOOP-19236 Project: Hadoop Common Issue Type: New Feature Components: fs, tools Reporter: Jinglun Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS is the object storage service of Volcano Engine. A common way is to store data into TOS and run Hadoop/Spark/Flink applications to access TOS. But there is no original support for TOS in hadoop, thus it is not easy for users to build their Big Data System based on TOS. This work aims to integrate TOS with Hadoop to help users run their applications on TOS. Users only need to do some simple configuration, then their applications can read/write TOS without any code change. This work is similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object Storage in Hadoop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17280) Service-user in DecayRPCScheduler shouldn't be accumulated to totalDecayedCallCost and totalRawCallCost.
Jinglun created HADOOP-17280: Summary: Service-user in DecayRPCScheduler shouldn't be accumulated to totalDecayedCallCost and totalRawCallCost. Key: HADOOP-17280 URL: https://issues.apache.org/jira/browse/HADOOP-17280 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun HADOOP-17165 has introduced a very useful feature: service-user. After this feature I think we shouldn't add the service-user's cost into totalDecayedCallCost and totalRawCallCost anymore. Because it may give all the identities the priority 0(Supposing we have a big service-user). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17268) Add RPC Quota to NameNode.
Jinglun created HADOOP-17268: Summary: Add RPC Quota to NameNode. Key: HADOOP-17268 URL: https://issues.apache.org/jira/browse/HADOOP-17268 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun Add the ability of rpc request quota to NameNode. All the requests exceeding quota would end with a 'Server too busy' exception. This can prevent users from overusing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17021) Add concat fs command
Jinglun created HADOOP-17021: Summary: Add concat fs command Key: HADOOP-17021 URL: https://issues.apache.org/jira/browse/HADOOP-17021 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun We should add one concat fs command for ease of use. It concatenates existing source files into the target file using FileSystem.concat(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16506) Create proper documentation for MetricLinkedBlockingQueue
Jinglun created HADOOP-16506: Summary: Create proper documentation for MetricLinkedBlockingQueue Key: HADOOP-16506 URL: https://issues.apache.org/jira/browse/HADOOP-16506 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun Assignee: Jinglun Add documentation for the MetricLinkedBlockingQueue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16403) Start a new statistical rpc queue and make the Reader's pendingConnection queue runtime-replaceable
Jinglun created HADOOP-16403: Summary: Start a new statistical rpc queue and make the Reader's pendingConnection queue runtime-replaceable Key: HADOOP-16403 URL: https://issues.apache.org/jira/browse/HADOOP-16403 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so after the active dead, it takes the standby more than 40s to become active. Many requests(tcp connect request and rpc request) from Datanodes, clients and zkfc timed out and start retrying. The suddenly request flood lasts for the next 2 minutes and finally all requests are either handled or run out of retry times. Adjusting the rpc related settings might power the NameNode and solve this problem and the key point is finding the bottle neck. The rpc server can be described as below: {noformat} Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat} By sampling some failed clients, I find many of them got ConnectException. It's caused by a 20s un-responded tcp connect request. I think may be the reader queue is full and block the listener from handling new connections. Both slow handlers and slow readers can block the whole processing progress, and I need to know who it is. I think *a queue that computes the qps, write log when the queue is full and could be replaced easily* will help. I find the nice work HADOOP-10302 implementing a runtime-swapped queue. Using it at Reader's queue makes the reader queue runtime-swapped automatically. The qps computing job could be done by implementing a subclass of LinkedBlockQueue that does the computing job while put/take/... happens. The qps data will show on jmx. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-16348) Remove redundant code when verify quota.
[ https://issues.apache.org/jira/browse/HADOOP-16348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun resolved HADOOP-16348. -- Resolution: Abandoned Release Note: Should under HDFS. > Remove redundant code when verify quota. > > > Key: HADOOP-16348 > URL: https://issues.apache.org/jira/browse/HADOOP-16348 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.1.1 >Reporter: Jinglun >Priority: Minor > > DirectoryWithQuotaFeature.verifyQuotaByStorageType() does the job of > verifying quota. It's redundant to call isQuotaByStorageTypeSet() because the > for each iterator nextline has done the same job. > {code:java} > if (!isQuotaByStorageTypeSet()) { // REDUNDANT. > return; > } > for (StorageType t: StorageType.getTypesSupportingQuota()) { > if (!isQuotaByStorageTypeSet(t)) { // CHECK FOR EACH STORAGETYPE. > continue; > } > if (Quota.isViolated(quota.getTypeSpace(t), usage.getTypeSpace(t), > typeDelta.get(t))) { > throw new QuotaByStorageTypeExceededException( > quota.getTypeSpace(t), usage.getTypeSpace(t) + typeDelta.get(t), t); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16348) Remove redundant code when verify quota.
Jinglun created HADOOP-16348: Summary: Remove redundant code when verify quota. Key: HADOOP-16348 URL: https://issues.apache.org/jira/browse/HADOOP-16348 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.1.1 Reporter: Jinglun DirectoryWithQuotaFeature.verifyQuotaByStorageType() does the job of verifying quota. It's redundant to call isQuotaByStorageTypeSet() because the for each iterator nextline has done the same job. {code:java} if (!isQuotaByStorageTypeSet()) { // REDUNDANT. return; } for (StorageType t: StorageType.getTypesSupportingQuota()) { if (!isQuotaByStorageTypeSet(t)) { // CHECK FOR EACH STORAGETYPE. continue; } if (Quota.isViolated(quota.getTypeSpace(t), usage.getTypeSpace(t), typeDelta.get(t))) { throw new QuotaByStorageTypeExceededException( quota.getTypeSpace(t), usage.getTypeSpace(t) + typeDelta.get(t), t); } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15946) the Connection thread should notify all calls in finally clause before quit.
Jinglun created HADOOP-15946: Summary: the Connection thread should notify all calls in finally clause before quit. Key: HADOOP-15946 URL: https://issues.apache.org/jira/browse/HADOOP-15946 Project: Hadoop Common Issue Type: Improvement Reporter: Jinglun Attachments: issue-replay.patch Threads that call Client.call() would wait forever unless the connection thread notifies them, so the connection thread should try it's best to notify when it's going to quit. In Connection.close(), if any Throwable occurs before cleanupCalls(), the connection thread will quit directly and leave all the waiting threads waiting forever. So i think doing cleanupCalls() in finally clause might be a good idea. I met this problem when i started a hadoop2.6 DataNode with 8 block pools. The DN successfully reported to 7 Namespaces and failed at the last Namespace because the connection thread of the heartbeat rpc got a "OOME:Direct buffer memory" and quit without calling cleanupCalls(). I think we can move cleanupCalls() to finally clause as a protection, because i notice in HADOOP-10940 the close of stream is changed to IOUtils.closeStream(ipcStreams) which catches all Throwable, so the problem i met was fixed. issue-replay.patch simulates the case i described above. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
Jinglun created HADOOP-15565: Summary: ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak. Key: HADOOP-15565 URL: https://issues.apache.org/jira/browse/HADOOP-15565 Project: Hadoop Common Issue Type: Bug Reporter: Jinglun When we create a ViewFileSystem, all it's child filesystems will be cached by FileSystem.CACHE. Unless we close these child filesystems, they will stay in FileSystem.CACHE forever. I think we should let FileSystem.CACHE cache ViewFileSystem only, and let ViewFileSystem cache all it's child filesystems. So we can close ViewFileSystem without leak and won't affect other ViewFileSystems. I find this problem because i need to re-login my kerberos and renew ViewFileSystem periodically. Because FileSystem.CACHE.Key is based on UserGroupInformation, which changes everytime i re-login, I can't use the cached child filesystems when i new a ViewFileSystem. And because ViewFileSystem.close does nothing but remove itself from cache, i leak all it's child filesystems in cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org