[jira] [Created] (SPARK-48241) CSV parsing failure with char/varchar type columns
Jiayi Liu created SPARK-48241: - Summary: CSV parsing failure with char/varchar type columns Key: SPARK-48241 URL: https://issues.apache.org/jira/browse/SPARK-48241 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1 Reporter: Jiayi Liu Fix For: 4.0.0 CSV table containing char and varchar columns will result in the following error when selecting from the CSV table: {code:java} java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} The reason for the error is that the StringType columns in the dataSchema and requiredSchema of UnivocityParser are not consistent. It is due to the metadata contained in the StringType StructField of the dataSchema, which is missing in the requiredSchema. We need to retain the metadata when resolving schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45834) Fix Pearson correlation calculation more stable
[ https://issues.apache.org/jira/browse/SPARK-45834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-45834: -- Description: Spark uses the formula {{ck / sqrt(xMk * yMk)}} to calculate the Pearson Correlation Coefficient. If {{xMk}} and {{yMk}} are very small, it can lead to double multiplication overflow, resulting in a denominator of 0. This leads to an Infinity result in the calculation. For example, when calculating the correlation for the same columns a and b in a table, the result will be Infinity, but the correlation for identical columns should be 1.0 instead. ||a||b|| |1e-200|1e-200| |1e-200|1e-200| |1e-100|1e-100| Modifying the formula to {{ck / sqrt(xMk) / sqrt(yMk)}} can indeed solve this issue and improve the stability of the calculation. The benefit of this modification is that it splits the square root of the denominator into two parts: {{sqrt(xMk)}} and {{{}sqrt(yMk){}}}. This helps avoid multiplication overflow or cases where the product of extremely small values becomes zero. was: Spark uses the formula {{ck / sqrt(xMk * yMk)}} to calculate the Pearson Correlation Coefficient. If {{xMk}} and {{yMk}} are very small, it can lead to double multiplication overflow, resulting in a denominator of 0. This leads to a NaN result in the calculation. For example, when calculating the correlation for the same columns a and b in a table, the result will be Infinity, but the correlation for identical columns should be 1.0 instead. ||a||b|| |1e-200|1e-200| |1e-200|1e-200| |1e-100|1e-100| Modifying the formula to {{ck / sqrt(xMk) / sqrt(yMk)}} can indeed solve this issue and improve the stability of the calculation. The benefit of this modification is that it splits the square root of the denominator into two parts: {{sqrt(xMk)}} and {{{}sqrt(yMk){}}}. This helps avoid multiplication overflow or cases where the product of extremely small values becomes zero. > Fix Pearson correlation calculation more stable > --- > > Key: SPARK-45834 > URL: https://issues.apache.org/jira/browse/SPARK-45834 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jiayi Liu >Priority: Major > > Spark uses the formula {{ck / sqrt(xMk * yMk)}} to calculate the Pearson > Correlation Coefficient. If {{xMk}} and {{yMk}} are very small, it can lead > to double multiplication overflow, resulting in a denominator of 0. This > leads to an Infinity result in the calculation. > For example, when calculating the correlation for the same columns a and b in > a table, the result will be Infinity, but the correlation for identical > columns should be 1.0 instead. > ||a||b|| > |1e-200|1e-200| > |1e-200|1e-200| > |1e-100|1e-100| > Modifying the formula to {{ck / sqrt(xMk) / sqrt(yMk)}} can indeed solve this > issue and improve the stability of the calculation. The benefit of this > modification is that it splits the square root of the denominator into two > parts: {{sqrt(xMk)}} and {{{}sqrt(yMk){}}}. This helps avoid multiplication > overflow or cases where the product of extremely small values becomes zero. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705990#comment-17705990 ] Jiayi Liu commented on SPARK-42947: --- issue fixed by https://github.com/apache/spark/pull/40577 > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider has domain configuration, such as Active Directory, > the principal should not be constructed according to the DN pattern, but the > username containing the domain should be directly passed to the LDAP provider > as the principal. We can refer to the implementation of Hive LdapUtils. > When the username contains a domain or domain passes from > hive.server2.authentication.ldap.Domain configuration, if we construct the > principal according to the DN pattern (For example, > uid=user@domain,dc=test,dc=com), we will get the following error: > {code:java} > 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: Error validating the login > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) > ~[libthrift-0.12.0.jar:0.12.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_352] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_352] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] > Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP > user > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > ... 8 more > Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - > 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data > 52e, v2580] > at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) > ~[?:1.8.0_352] > at > javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) > ~[?:1.8.0_352] > at > javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) > ~[?:1.8.0_352] > at javax.naming.InitialContext.init(InitialContext.java:244) > ~[?:1.8.0_352] > at javax.naming.InitialContext.(InitialContext.java:216) > ~[?:1.8.0_352] > at > javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) > ~[?:1.8.0_352] > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at >
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the username containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: ``` 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more ``` we should pass user@domain directly to the LDAP provider, just like HiveServer did. was:When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the
[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705938#comment-17705938 ] Jiayi Liu commented on SPARK-42947: --- I will try to fix this. > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider includes domain configuration, such as Active > Directory, the principal should not be constructed according to the DN > pattern, but the user containing the domain should be directly passed to the > LDAP provider as the principal. We can refer to the implementation of Hive > LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Summary: Spark Thriftserver LDAP should not use DN pattern if user contains domain (was: Spark Thriftserver should not use dn pattern if user contains domain) > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider includes domain configuration, such as Active > Directory, the principal should not be constructed according to the DN > pattern, but the user containing the domain should be directly passed to the > LDAP provider as the principal. We can refer to the implementation of Hive > LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42947) Spark Thriftserver should not use dn pattern if user contains domain
Jiayi Liu created SPARK-42947: - Summary: Spark Thriftserver should not use dn pattern if user contains domain Key: SPARK-42947 URL: https://issues.apache.org/jira/browse/SPARK-42947 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Jiayi Liu When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38217) insert overwrite failed for external table with dynamic partition table
[ https://issues.apache.org/jira/browse/SPARK-38217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646464#comment-17646464 ] Jiayi Liu commented on SPARK-38217: --- This is because spark deletes the overwrite partition, but hive does not know this information, and throws an exception when listStatus or deletes a file that does not exist, causing loadPartition to terminate. > insert overwrite failed for external table with dynamic partition table > --- > > Key: SPARK-38217 > URL: https://issues.apache.org/jira/browse/SPARK-38217 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: YuanGuanhu >Priority: Major > > can't insert overwrite dynamic partition table, reproduce step with > spark3.2.1 hadoop 3.2: > sql("CREATE EXTERNAL TABLE exttb01(id int) PARTITIONED BY (p1 string, p2 > string) STORED AS PARQUET LOCATION '/tmp/exttb01'") > sql("set spark.sql.hive.convertMetastoreParquet=false") > sql("set hive.exec.dynamic.partition.mode=nonstrict") > val insertsql = "INSERT OVERWRITE TABLE exttb01 PARTITION(p1='n1', p2) SELECT > * FROM VALUES (1, 'n2'), (2, 'n3'), (3, 'n4') AS t(id, p2)" > sql(insertsql) > sql(insertsql) > when execute insert overwrite 2th time, it failed > > WARN Hive: Directory file:/tmp/exttb01/p1=n1/p2=n4 cannot be cleaned: > java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n4 does not > exist > java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n4 does not > exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:597) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3440) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1657) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1929) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1920) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 22/02/15 17:59:19 WARN Hive: Directory file:/tmp/exttb01/p1=n1/p2=n3 cannot > be cleaned: java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n3 > does not exist > java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n3 does not > exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:597) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3440) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1657) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1929) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1920) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 22/02/15 17:59:19 WARN Hive: Directory file:/tmp/exttb01/p1=n1/p2=n2 cannot > be cleaned: java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n2 > does not exist > java.io.FileNotFoundException: File file:/tmp/exttb01/p1=n1/p2=n2 does not > exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:597) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) > at