luffyd opened a new issue #1913: URL: https://github.com/apache/hudi/issues/1913
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I have an MOR table with continuous upserts in a loop. I have the maxFileLimit to 2G and smallFileLimit as 1G I am seeing `java.io.IOException: Too many open files` in the Hudi logs and spark job terminates with FATAL. Every upsert contains 3M records new records This happens after about 200M in the table. **To Reproduce** Steps to reproduce the behavior: ``` val parallelism = options.getOrElse("parallelism", Math.max(2, upsertCount/100000).toString).toInt println("parallelism", parallelism) var fileSizeOptions = Map[String,String] () fileSizeOptions += (HoodieStorageConfig.PARQUET_FILE_MAX_BYTES -> String.valueOf(2013265920)) // 2GB fileSizeOptions += (HoodieStorageConfig.PARQUET_BLOCK_SIZE_BYTES -> String.valueOf(2013265920)) // 2GB fileSizeOptions += (HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES -> String.valueOf(1006632960)) // 1GB (inputDF .write .format("org.apache.hudi") .option(HoodieWriteConfig.TABLE_NAME, options.getOrElse("tableName", "facestest")) .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL) .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionKey") .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "nodeId") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts") .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) .option("hoodie.upsert.shuffle.parallelism", parallelism)//not the ideal way but works .option("hoodie.bulkinsert.shuffle.parallelism", parallelism)//not the ideal way but works .option("hoodie.cleaner.commits.retained", "1") .options(fileSizeOptions) .mode(SaveMode.Append) .save(getHudiPath(spark))) ``` **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.5.3 * Spark version : 2.4.5 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** I notice lots of FileSystemViewHandler activity happening like this ``` 20/08/03 03:26:49 INFO FileSystemViewHandler: TimeTakenMillis[Total=101, Refresh=98, handle=3, Check=0], Success=true, Query=basepath=s3%3A%2F%2Fchelan-dev-mock-faces%2FTestFacesUpserForLoop%2F1000P2G&lastinstantts=20200803032307&timelinehash=9ef96ca1f93d48b24891b749ead94dfeb95ac7123fe538eb09733a1494befe77, Host=ip-10-0-1-42.us-west-2.compute.internal:39767, synced=false ``` Looking at the host in the above logs, it is the driver host where this logs are being executed. Can you explain why driver has to access the FileSystem so much. ``` [hadoop@ip-10-0-1-42 s-1DFOYHZTX6MAG]$ zgrep FileSystemViewHandler stderr.gz |wc -l 506548 [hadoop@ip-10-0-1-42 s-1DFOYHZTX6MAG]$ zgrep 'Too many open files' stderr.gz |wc -l 17054 ``` **Stacktrace** ``` 20/08/03 10:00:44 INFO FileSystemViewHandler: TimeTakenMillis[Total=1, Refresh=0, handle=0, Check=0], Success=false, Query=partition=670&basepath=s3%3A%2F%2Fchelan-dev-mock-faces%2FTestFacesUpserForLoop%2F1000P2G&lastinstantts=20200803094501&timelinehash=96d2a8ca3e0bb58b3 82377345ed35eaf99f09f97540b13bf6295682fc42eade0, Host=ip-10-0-1-42.us-west-2.compute.internal:44483, synced=false 20/08/03 10:00:44 WARN ExceptionMapper: Uncaught exception com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: chelan-dev-mock-faces.s3.us-west-2.amazonaws.com at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1189) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1135) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:784) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:752) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1335) at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:22) at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:8) at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:114) at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:189) at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184) at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96) at com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:43) at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.getFileMetadataFromCacheOrS3(Jets3tNativeFileSystemStore.java:497) at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:223) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:590) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:357) at org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:460) at org.apache.hudi.common.fs.FSUtils.createPathIfNotExists(FSUtils.java:517) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$5(AbstractTableFileSystemView.java:223) at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:214) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroups(AbstractTableFileSystemView.java:545) at org.apache.hudi.timeline.service.handlers.FileSliceHandler.getAllFileGroups(FileSliceHandler.java:88) at org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerFileSlicesAPI$17(FileSystemViewHandler.java:281) at org.apache.hudi.timeline.service.FileSystemViewHandler$ViewHandler.handle(FileSystemViewHandler.java:329) at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: chelan-dev-mock-faces.s3.us-west-2.amazonaws.com at java.net.InetAddress.getAllByName0(InetAddress.java:1281) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.$Proxy27.connect(Unknown Source) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1311) at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1127) ... 62 more 20/08/03 10:00:44 WARN AbstractConnector: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:419) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:247) at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397) at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org