[jira] [Updated] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

huanghuai (JIRA) Tue, 25 Sep 2018 20:20:11 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


huanghuai updated SPARK-25523:
------------------------------
    Description: 
public static void test2() throws Exception{
 String ckUrlPrefix="jdbc:clickhouse://";
 String quote = "`";
 JdbcDialects.registerDialect(new JdbcDialect() {
 @Override
 public boolean canHandle(String url) {
 return url.startsWith(ckUrlPrefix);
 }
 @Override
 public String quoteIdentifier(String colName) {
 return quote + colName + quote;
 }
 });

 SparkSession spark = initSpark();
 String ckUrl = "jdbc:clickhouse://192.168.2.148:8123/default";
 Properties ckProp = new Properties();
 ckProp.put("user", "default");
 ckProp.put("password", "");


 String prestoUrl = "jdbc:presto://192.168.2.148:9002/mysql-xxx/xxx";
 Properties prestoUrlProp = new Properties();
 prestoUrlProp.put("user", "root");
 prestoUrlProp.put("password", "");



// new Thread(()->{
// spark.read()
// .jdbc(ckUrl, "ontime", ckProp).show();
// }).start();


 System.out.println("------------------------------------------------------");

 new Thread(()->{
 spark.read()
 .jdbc(prestoUrl, "tx_user", prestoUrlProp).show();
 }).start();

 System.out.println("------------------------------------------------------");

 new Thread(()->{
 Dataset<Row> load = spark.read()
 .format("com.vertica.spark.datasource.DefaultSource")
 .option("host", "192.168.1.102")
 .option("port", 5433)
 .option("user", "dbadmin")
 .option("password", "manager")

 .option("db", "test")
 .option("dbschema", "public")
 .option("table", "customers")
 .load();
 load.printSchema();
 load.show();
 }).start();
 System.out.println("------------------------------------------------------");
 }




 public static SparkSession initSpark() throws Exception{
 return SparkSession.builder()
 .master("spark://dsjkfb1:7077") //spark://dsjkfb1:7077
 .appName("Test")
 .config("spark.executor.instances",3)
 .config("spark.executor.cores",2)
 .config("spark.cores.max",6)
 //.config("spark.default.parallelism",1)
 .config("spark.submit.deployMode","client")
 .config("spark.driver.memory","2G")
 .config("spark.executor.memory","3G")
 .config("spark.driver.maxResultSize", "2G")
 .config("spark.local.dir", "d:\\tmp")
 .config("spark.driver.host", "192.168.2.148")
 .config("spark.scheduler.mode", "FAIR")
 .config("spark.jars", "F:\\project\\xxx\\vertica-jdbc-7.0.1-0.jar," +
 "F:\\project\\xxx\\clickhouse-jdbc-0.1.40.jar," +
 "F:\\project\\xxx\\vertica-spark-connector-9.1-2.1.jar," +
 "F:\\project\\xxx\\presto-jdbc-0.189-mining.jar") 
 .getOrCreate();
 }

 

 

{color:#FF0000}*----------------------------  The above is code 
------------------------------*{color}

{color:#FF0000}*question: If i open vertica jdbc , thread will pending 
forever.*{color}

{color:#FF0000}*And driver loging like this:*{color}

 

2018-09-26 10:32:51 INFO SharedState:54 - Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/').
2018-09-26 10:32:51 INFO SharedState:54 - Warehouse path is 
'file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/'.
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2f70d6e2\{/SQL,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@1d66833d\{/SQL/json,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@65af6f3a\{/SQL/execution,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@55012968\{/SQL/execution/json,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@59e3f5aa\{/static/sql,null,AVAILABLE,@Spark}
2018-09-26 10:32:52 INFO StateStoreCoordinatorRef:54 - Registered 
StateStoreCoordinator endpoint
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.232:49434) with ID 0
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.233:44834) with ID 2
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.232:35380 with 1458.6 MB RAM, BlockManagerId(0, 
192.168.4.232, 35380, None)
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.231:42504) with ID 1
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.233:40882 with 1458.6 MB RAM, BlockManagerId(2, 
192.168.4.233, 40882, None)
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.231:44682 with 1458.6 MB RAM, BlockManagerId(1, 
192.168.4.231, 44682, None)
2018-09-26 10:33:29 INFO ApplicationHandler:353 - Initiating Jersey 
application, version Jersey: 2.8 2014-04-29 01:25:26...

 

*{color:#FF0000}no more.{color}*

*{color:#FF0000}if I Annotate vertica jdbc code, everything will be ok.{color}*

*{color:#FF0000}The vervica jdbc url is wrong , it's doesn't work.{color}*

*{color:#FF0000}if i run vertica jdbc code alone, it will throw connection 
exception.That's right.{color}*

 

*{color:#FF0000}but, when it run with other , it will pending. and spark ui 
show no any job exist.{color}*

 

 

 

  was:
when i execute sparkSession.read().jdbc(url, table, properties) in 2 or 3 
thread , it will be in limitless waiting.

 

I found it by debug step by step, and finally waited here: 

JdbcRelationProvider.createRelation() method

at line 34:

val jdbcOptions = new JDBCOptions(parameters)

 

when I press F8, Spark did not run away.

 


> Multi thread execute sparkSession.read().jdbc(url, table, properties) problem
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-25523
>                 URL: https://issues.apache.org/jira/browse/SPARK-25523
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: h3. [IntelliJ 
> _IDEA_|http://www.baidu.com/link?url=7ZLtsOfyqR1YxLqcTU0Q-hqXWV_PsY6IzIzZoKhiXZZ4AcLrpQ4DoTG30yIN-Gs8]
>  
> local mode
>  
>            Reporter: huanghuai
>            Priority: Major
>
> public static void test2() throws Exception{
>  String ckUrlPrefix="jdbc:clickhouse://";
>  String quote = "`";
>  JdbcDialects.registerDialect(new JdbcDialect() {
>  @Override
>  public boolean canHandle(String url) {
>  return url.startsWith(ckUrlPrefix);
>  }
>  @Override
>  public String quoteIdentifier(String colName) {
>  return quote + colName + quote;
>  }
>  });
>  SparkSession spark = initSpark();
>  String ckUrl = "jdbc:clickhouse://192.168.2.148:8123/default";
>  Properties ckProp = new Properties();
>  ckProp.put("user", "default");
>  ckProp.put("password", "");
>  String prestoUrl = "jdbc:presto://192.168.2.148:9002/mysql-xxx/xxx";
>  Properties prestoUrlProp = new Properties();
>  prestoUrlProp.put("user", "root");
>  prestoUrlProp.put("password", "");
> // new Thread(()->{
> // spark.read()
> // .jdbc(ckUrl, "ontime", ckProp).show();
> // }).start();
>  System.out.println("------------------------------------------------------");
>  new Thread(()->{
>  spark.read()
>  .jdbc(prestoUrl, "tx_user", prestoUrlProp).show();
>  }).start();
>  System.out.println("------------------------------------------------------");
>  new Thread(()->{
>  Dataset<Row> load = spark.read()
>  .format("com.vertica.spark.datasource.DefaultSource")
>  .option("host", "192.168.1.102")
>  .option("port", 5433)
>  .option("user", "dbadmin")
>  .option("password", "manager")
>  .option("db", "test")
>  .option("dbschema", "public")
>  .option("table", "customers")
>  .load();
>  load.printSchema();
>  load.show();
>  }).start();
>  System.out.println("------------------------------------------------------");
>  }
>  public static SparkSession initSpark() throws Exception{
>  return SparkSession.builder()
>  .master("spark://dsjkfb1:7077") //spark://dsjkfb1:7077
>  .appName("Test")
>  .config("spark.executor.instances",3)
>  .config("spark.executor.cores",2)
>  .config("spark.cores.max",6)
>  //.config("spark.default.parallelism",1)
>  .config("spark.submit.deployMode","client")
>  .config("spark.driver.memory","2G")
>  .config("spark.executor.memory","3G")
>  .config("spark.driver.maxResultSize", "2G")
>  .config("spark.local.dir", "d:\\tmp")
>  .config("spark.driver.host", "192.168.2.148")
>  .config("spark.scheduler.mode", "FAIR")
>  .config("spark.jars", "F:\\project\\xxx\\vertica-jdbc-7.0.1-0.jar," +
>  "F:\\project\\xxx\\clickhouse-jdbc-0.1.40.jar," +
>  "F:\\project\\xxx\\vertica-spark-connector-9.1-2.1.jar," +
>  "F:\\project\\xxx\\presto-jdbc-0.189-mining.jar") 
>  .getOrCreate();
>  }
>  
>  
> {color:#FF0000}*----------------------------  The above is code 
> ------------------------------*{color}
> {color:#FF0000}*question: If i open vertica jdbc , thread will pending 
> forever.*{color}
> {color:#FF0000}*And driver loging like this:*{color}
>  
> 2018-09-26 10:32:51 INFO SharedState:54 - Setting 
> hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir 
> ('file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/').
> 2018-09-26 10:32:51 INFO SharedState:54 - Warehouse path is 
> 'file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/'.
> 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@2f70d6e2\{/SQL,null,AVAILABLE,@Spark}
> 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@1d66833d\{/SQL/json,null,AVAILABLE,@Spark}
> 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@65af6f3a\{/SQL/execution,null,AVAILABLE,@Spark}
> 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@55012968\{/SQL/execution/json,null,AVAILABLE,@Spark}
> 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@59e3f5aa\{/static/sql,null,AVAILABLE,@Spark}
> 2018-09-26 10:32:52 INFO StateStoreCoordinatorRef:54 - Registered 
> StateStoreCoordinator endpoint
> 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
> Registered executor NettyRpcEndpointRef(spark-client://Executor) 
> (192.168.4.232:49434) with ID 0
> 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
> Registered executor NettyRpcEndpointRef(spark-client://Executor) 
> (192.168.4.233:44834) with ID 2
> 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
> manager 192.168.4.232:35380 with 1458.6 MB RAM, BlockManagerId(0, 
> 192.168.4.232, 35380, None)
> 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
> Registered executor NettyRpcEndpointRef(spark-client://Executor) 
> (192.168.4.231:42504) with ID 1
> 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
> manager 192.168.4.233:40882 with 1458.6 MB RAM, BlockManagerId(2, 
> 192.168.4.233, 40882, None)
> 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
> manager 192.168.4.231:44682 with 1458.6 MB RAM, BlockManagerId(1, 
> 192.168.4.231, 44682, None)
> 2018-09-26 10:33:29 INFO ApplicationHandler:353 - Initiating Jersey 
> application, version Jersey: 2.8 2014-04-29 01:25:26...
>  
> *{color:#FF0000}no more.{color}*
> *{color:#FF0000}if I Annotate vertica jdbc code, everything will be 
> ok.{color}*
> *{color:#FF0000}The vervica jdbc url is wrong , it's doesn't work.{color}*
> *{color:#FF0000}if i run vertica jdbc code alone, it will throw connection 
> exception.That's right.{color}*
>  
> *{color:#FF0000}but, when it run with other , it will pending. and spark ui 
> show no any job exist.{color}*
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

Reply via email to