[jira] [Created] (HADOOP-19236) Integration of Volcano Engine TOS in Hadoop.

2024-07-24 Thread Jinglun (Jira)
Jinglun created HADOOP-19236:


 Summary: Integration of Volcano Engine TOS in Hadoop.
 Key: HADOOP-19236
 URL: https://issues.apache.org/jira/browse/HADOOP-19236
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, tools
Reporter: Jinglun


Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS is 
the object storage service of Volcano Engine. A common way is to store data 
into TOS and run Hadoop/Spark/Flink applications to access TOS. But there is no 
original support for TOS in hadoop, thus it is not easy for users to build 
their Big Data System based on TOS.
 
This work aims to integrate TOS with Hadoop to help users run their 
applications on TOS. Users only need to do some simple configuration, then 
their applications can read/write TOS without any code change. This work is 
similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object 
Storage in Hadoop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17280) Service-user in DecayRPCScheduler shouldn't be accumulated to totalDecayedCallCost and totalRawCallCost.

2020-09-22 Thread Jinglun (Jira)
Jinglun created HADOOP-17280:


 Summary: Service-user in DecayRPCScheduler shouldn't be 
accumulated to totalDecayedCallCost and totalRawCallCost.
 Key: HADOOP-17280
 URL: https://issues.apache.org/jira/browse/HADOOP-17280
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun


HADOOP-17165 has introduced a very useful feature: service-user. After this 
feature I think we shouldn't add the service-user's cost into 
totalDecayedCallCost and totalRawCallCost anymore. Because it may give all the 
identities the priority 0(Supposing we have a big service-user).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17268) Add RPC Quota to NameNode.

2020-09-17 Thread Jinglun (Jira)
Jinglun created HADOOP-17268:


 Summary: Add RPC Quota to NameNode.
 Key: HADOOP-17268
 URL: https://issues.apache.org/jira/browse/HADOOP-17268
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun


Add the ability of rpc request quota to NameNode. All the requests exceeding 
quota would end with a 'Server too busy' exception. This can prevent users from 
overusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17021) Add concat fs command

2020-04-30 Thread Jinglun (Jira)
Jinglun created HADOOP-17021:


 Summary: Add concat fs command
 Key: HADOOP-17021
 URL: https://issues.apache.org/jira/browse/HADOOP-17021
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun


We should add one concat fs command for ease of use. It concatenates existing 
source files into the target file using FileSystem.concat().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16506) Create proper documentation for MetricLinkedBlockingQueue

2019-08-12 Thread Jinglun (JIRA)
Jinglun created HADOOP-16506:


 Summary: Create proper documentation for MetricLinkedBlockingQueue
 Key: HADOOP-16506
 URL: https://issues.apache.org/jira/browse/HADOOP-16506
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun
Assignee: Jinglun


Add documentation for the MetricLinkedBlockingQueue. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16403) Start a new statistical rpc queue and make the Reader's pendingConnection queue runtime-replaceable

2019-07-01 Thread Jinglun (JIRA)
Jinglun created HADOOP-16403:


 Summary: Start a new statistical rpc queue and make the Reader's 
pendingConnection queue runtime-replaceable
 Key: HADOOP-16403
 URL: https://issues.apache.org/jira/browse/HADOOP-16403
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun


I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so 
after the active dead, it takes the standby more than 40s to become active. 
Many requests(tcp connect request and rpc request) from Datanodes, clients and 
zkfc timed out and start retrying. The suddenly request flood lasts for the 
next 2 minutes and finally all requests are either handled or run out of retry 
times. 
Adjusting the rpc related settings might power the NameNode and solve this 
problem and the key point is finding the bottle neck. The rpc server can be 
described as below:
{noformat}
Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
By sampling some failed clients, I find many of them got ConnectException. It's 
caused by a 20s un-responded tcp connect request. I think may be the reader 
queue is full and block the listener from handling new connections. Both slow 
handlers and slow readers can block the whole processing progress, and I need 
to know who it is. I think *a queue that computes the qps, write log when the 
queue is full and could be replaced easily* will help. 
I find the nice work HADOOP-10302 implementing a runtime-swapped queue. Using 
it at Reader's queue makes the reader queue runtime-swapped automatically. The 
qps computing job could be done by implementing a subclass of LinkedBlockQueue 
that does the computing job while put/take/... happens. The qps data will show 
on jmx.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16348) Remove redundant code when verify quota.

2019-06-04 Thread Jinglun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun resolved HADOOP-16348.
--
  Resolution: Abandoned
Release Note: Should under HDFS.

> Remove redundant code when verify quota.
> 
>
> Key: HADOOP-16348
> URL: https://issues.apache.org/jira/browse/HADOOP-16348
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Jinglun
>Priority: Minor
>
> DirectoryWithQuotaFeature.verifyQuotaByStorageType() does the job of 
> verifying quota. It's redundant to call isQuotaByStorageTypeSet() because the 
> for each iterator nextline has done the same job.
> {code:java}
> if (!isQuotaByStorageTypeSet()) { // REDUNDANT.
>   return;
> }
> for (StorageType t: StorageType.getTypesSupportingQuota()) {
>   if (!isQuotaByStorageTypeSet(t)) { // CHECK FOR EACH STORAGETYPE.
> continue;
>   }
>   if (Quota.isViolated(quota.getTypeSpace(t), usage.getTypeSpace(t),
>   typeDelta.get(t))) {
> throw new QuotaByStorageTypeExceededException(
> quota.getTypeSpace(t), usage.getTypeSpace(t) + typeDelta.get(t), t);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16348) Remove redundant code when verify quota.

2019-06-04 Thread Jinglun (JIRA)
Jinglun created HADOOP-16348:


 Summary: Remove redundant code when verify quota.
 Key: HADOOP-16348
 URL: https://issues.apache.org/jira/browse/HADOOP-16348
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.1.1
Reporter: Jinglun


DirectoryWithQuotaFeature.verifyQuotaByStorageType() does the job of verifying 
quota. It's redundant to call isQuotaByStorageTypeSet() because the for each 
iterator nextline has done the same job.
{code:java}
if (!isQuotaByStorageTypeSet()) { // REDUNDANT.
  return;
}
for (StorageType t: StorageType.getTypesSupportingQuota()) {
  if (!isQuotaByStorageTypeSet(t)) { // CHECK FOR EACH STORAGETYPE.
continue;
  }
  if (Quota.isViolated(quota.getTypeSpace(t), usage.getTypeSpace(t),
  typeDelta.get(t))) {
throw new QuotaByStorageTypeExceededException(
quota.getTypeSpace(t), usage.getTypeSpace(t) + typeDelta.get(t), t);
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15946) the Connection thread should notify all calls in finally clause before quit.

2018-11-22 Thread Jinglun (JIRA)
Jinglun created HADOOP-15946:


 Summary: the Connection thread should notify all calls in finally 
clause before quit.
 Key: HADOOP-15946
 URL: https://issues.apache.org/jira/browse/HADOOP-15946
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jinglun
 Attachments: issue-replay.patch

Threads that call Client.call() would wait forever unless the connection thread 
notifies them, so the connection thread should try it's best to notify when 
it's going to quit.

In Connection.close(), if any Throwable occurs before cleanupCalls(), the 
connection thread will quit directly and leave all the waiting threads waiting 
forever. So i think doing cleanupCalls() in finally clause might be a good idea.

I met this problem when i started a hadoop2.6 DataNode with 8 block pools. The 
DN successfully reported to 7 Namespaces and failed at the last Namespace 
because the connection thread of the heartbeat rpc got a "OOME:Direct buffer 
memory" and quit without calling cleanupCalls().

I think we can move cleanupCalls() to finally clause as a protection, because i 
notice in HADOOP-10940 the close of stream is changed to 
IOUtils.closeStream(ipcStreams) which catches all Throwable, so the problem i 
met was fixed. 

issue-replay.patch simulates the case i described above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.

2018-06-26 Thread Jinglun (JIRA)
Jinglun created HADOOP-15565:


 Summary: ViewFileSystem.close doesn't close child filesystems and 
causes FileSystem objects leak.
 Key: HADOOP-15565
 URL: https://issues.apache.org/jira/browse/HADOOP-15565
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jinglun


When we create a ViewFileSystem, all it's child filesystems will be cached by 
FileSystem.CACHE. Unless we close these child filesystems, they will stay in 
FileSystem.CACHE forever.
I think we should let FileSystem.CACHE cache ViewFileSystem only, and let 
ViewFileSystem cache all it's child filesystems. So we can close ViewFileSystem 
without leak and won't affect other ViewFileSystems.
I find this problem because i need to re-login my kerberos and renew 
ViewFileSystem periodically. Because FileSystem.CACHE.Key is based on 
UserGroupInformation, which changes everytime i re-login, I can't use the 
cached child filesystems when i new a ViewFileSystem. And because 
ViewFileSystem.close does nothing but remove itself from cache, i leak all it's 
child filesystems in cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org