[jira] [Updated] (SPARK-22457) Tables are supposed to be MANAGED only taking into account whether a path is provided

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-22457:
-
Labels: bulk-closed  (was: )

> Tables are supposed to be MANAGED only taking into account whether a path is 
> provided
> -
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: David Arroyo
>Priority: Major
>  Labels: bulk-closed
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
> taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
>   CatalogTableType.EXTERNAL
> } else {
>   CatalogTableType.MANAGED
> }
> {code}
> This solution seems to be right for filesystem based data sources. On the 
> other hand, when working with other data sources such as elasticsearch, that 
> solution is leading to a weird behaviour described below: 
> 1) InMemoryCatalog's doCreateTable() adds a locationURI if 
> CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.
> 2) Before loading the data source table FindDataSourceTable's 
> readDataSourceTable() adds a path option if locationURI exists:
> {code:java}
> val pathOption = table.storage.locationUri.map("path" -> 
> CatalogUtils.URIToString(_))
> {code}
> 3) That causes an error when reading from elasticsearch because 'path' is an 
> option already supported by elasticsearch (locationUri is set to 
> file:/home/user/spark-rv/elasticsearch/shop/clients)
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find 
> mapping for file:/home/user/spark-rv/elasticsearch/shop/clients - one is 
> required before using Spark SQL
> Would be possible only to mark tables as MANAGED for a subset of data sources 
> (TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?
> P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
> from my point of view should only be required for filesystem based data 
> sources: 
> {code:java}
>if (tableMeta.tableType == CatalogTableType.MANAGED)
>...
>// Delete the data/directory of the table
> val dir = new Path(tableMeta.location)
> try {
>   val fs = dir.getFileSystem(hadoopConfig)
>   fs.delete(dir, true)
> } catch {
>   case e: IOException =>
> throw new SparkException(s"Unable to drop table $table as failed 
> " +
>   s"to delete its directory $dir", e)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22457) Tables are supposed to be MANAGED only taking into account whether a path is provided

2017-11-06 Thread David Arroyo Cazorla (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arroyo Cazorla updated SPARK-22457:
-
Description: 
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:java}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below: 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:java}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only to mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:java}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}

  was:
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:java}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below: 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:java}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:java}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}


> Tables are supposed to be MANAGED only taking into account whether a path is 
> provided
> -
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: David Arroyo Cazorla
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
> taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
>   CatalogTableType.EXTERNAL
> } else {
>  

[jira] [Updated] (SPARK-22457) Tables are supposed to be MANAGED only taking into account whether a path is provided

2017-11-06 Thread David Arroyo Cazorla (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arroyo Cazorla updated SPARK-22457:
-
Description: 
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:java}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below. 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:java}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:java}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}

  was:
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:scala}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below. 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:scala}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:scala}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}


> Tables are supposed to be MANAGED only taking into account whether a path is 
> provided
> -
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: David Arroyo Cazorla
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
> taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
>   CatalogTableType.EXTERNAL
> } else {
>  

[jira] [Updated] (SPARK-22457) Tables are supposed to be MANAGED only taking into account whether a path is provided

2017-11-06 Thread David Arroyo Cazorla (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arroyo Cazorla updated SPARK-22457:
-
Description: 
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:java}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below: 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:java}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:java}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}

  was:
As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
taking into account whether a path is provided:

{code:java}
val tableType = if (storage.locationUri.isDefined) {
  CatalogTableType.EXTERNAL
} else {
  CatalogTableType.MANAGED
}
{code}

This solution seems to be right for filesystem based data sources. On the other 
hand, when working with other data sources such as elasticsearch, that solution 
is leading to a weird behaviour described below. 

1) InMemoryCatalog's doCreateTable() adds a locationURI if 
CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.

2) Before loading the data source table FindDataSourceTable's 
readDataSourceTable() adds a path option if locationURI exists:

{code:java}
val pathOption = table.storage.locationUri.map("path" -> 
CatalogUtils.URIToString(_))
{code}

3) That causes an error when reading from elasticsearch because 'path' is an 
option already supported by elasticsearch (locationUri is set to 
file:/home/user/spark-rv/elasticsearch/shop/clients)

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping 
for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required 
before using Spark SQL


Would be possible only mark tables as MANAGED for a subset of data sources 
(TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?

P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which 
from my point of view should only be required for filesystem based data 
sources: 
{code:java}
   if (tableMeta.tableType == CatalogTableType.MANAGED)
   ...
   // Delete the data/directory of the table
val dir = new Path(tableMeta.location)
try {
  val fs = dir.getFileSystem(hadoopConfig)
  fs.delete(dir, true)
} catch {
  case e: IOException =>
throw new SparkException(s"Unable to drop table $table as failed " +
  s"to delete its directory $dir", e)
}
{code}


> Tables are supposed to be MANAGED only taking into account whether a path is 
> provided
> -
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: David Arroyo Cazorla
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only 
> taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
>   CatalogTableType.EXTERNAL
> } else {
>