Re: Spark Scala API still not updated for 2.13 or it's a mistake?

2022-08-02 Thread pengyh



I can use scala 2.13 for spark-shell, but not spark-submit.

regards.

Spark 3.3.0 supports 2.13, though you need to build it for 2.13. The 
default binary distro uses 2.12.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Scala API still not updated for 2.13 or it's a mistake?

2022-08-02 Thread Sean Owen
Oh ha we do provide a pre-built binary for 2.13, oops.
I can just remove that line from the docs, I think it's just out of date.

On Tue, Aug 2, 2022 at 11:01 AM Roman I  wrote:

> Ok, though the doc is misleading. The official site provides both 2.12 and
> 2.13 pre-build binaries.
> Thanks
>
> On 2 Aug 2022, at 18:52, Sean Owen  wrote:
>
> Spark 3.3.0 supports 2.13, though you need to build it for 2.13. The
> default binary distro uses 2.12.
>
> On Tue, Aug 2, 2022, 10:47 AM Roman I  wrote:
>
>>
>> For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a
>> compatible Scala version (2.12.x).
>>
>> https://spark.apache.org/docs/latest/
>>
> 
>
>
>


Re: Spark Scala API still not updated for 2.13 or it's a mistake?

2022-08-02 Thread Roman I
Ok, though the doc is misleading. The official site provides both 2.12 and 2.13 
pre-build binaries. 
Thanks

> On 2 Aug 2022, at 18:52, Sean Owen  wrote:
> 
> Spark 3.3.0 supports 2.13, though you need to build it for 2.13. The default 
> binary distro uses 2.12. 
> 
> On Tue, Aug 2, 2022, 10:47 AM Roman I  > wrote:
> 
> 
> For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a 
> compatible Scala version (2.12.x).
> https://spark.apache.org/docs/latest/ 
> 



Re: Spark Scala API still not updated for 2.13 or it's a mistake?

2022-08-02 Thread Sean Owen
Spark 3.3.0 supports 2.13, though you need to build it for 2.13. The
default binary distro uses 2.12.

On Tue, Aug 2, 2022, 10:47 AM Roman I  wrote:

>
> For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a
> compatible Scala version (2.12.x).
>
> https://spark.apache.org/docs/latest/
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Scala API still not updated for 2.13 or it's a mistake?

2022-08-02 Thread Roman I


For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a 
compatible Scala version (2.12.x).
https://spark.apache.org/docs/latest/ 

Re: log transfering into hadoop/spark

2022-08-02 Thread Gourav Sengupta
hi,
I do it with simple bash scripts to transfer to s3. Takes less than 1
minute to write it, and another 1 min to include it bootstrap scripts.

Never saw the need for so much hype for such simple tasks.

Regards,
Gourav Sengupta

On Tue, Aug 2, 2022 at 2:16 PM ayan guha  wrote:

> ELK or Splunk agents typically.
>
> Or if you are in cloud then there are cloud native solutions which can
> forward logs to object store, which can then be read like hdfs.
>
> On Tue, 2 Aug 2022 at 4:43 pm, pengyh  wrote:
>
>> since flume is not continued to develop.
>> what's the current opensource tool to transfer webserver logs into
>> hdfs/spark?
>>
>> thank you.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Best Regards,
> Ayan Guha
>


Re: log transfering into hadoop/spark

2022-08-02 Thread ayan guha
ELK or Splunk agents typically.

Or if you are in cloud then there are cloud native solutions which can
forward logs to object store, which can then be read like hdfs.

On Tue, 2 Aug 2022 at 4:43 pm, pengyh  wrote:

> since flume is not continued to develop.
> what's the current opensource tool to transfer webserver logs into
> hdfs/spark?
>
> thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Best Regards,
Ayan Guha


Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
That isn't the issue - the table does not exist anyway, but the storage
path does.

On Tue, Aug 2, 2022 at 6:48 AM Stelios Philippou  wrote:

> HI Kumba.
>
> SQL Structure is a bit different for
> CREATE OR REPLACE TABLE
>
>
> You can only do the following
> CREATE TABLE IF NOT EXISTS
>
>
>
>
> https://spark.apache.org/docs/3.3.0/sql-ref-syntax-ddl-create-table-datasource.html
>
> On Tue, 2 Aug 2022 at 14:38, Sean Owen  wrote:
>
>> I don't think "CREATE OR REPLACE TABLE" exists (in SQL?); this isn't a
>> VIEW.
>> Delete the path first; that's simplest.
>>
>> On Tue, Aug 2, 2022 at 12:55 AM Kumba Janga  wrote:
>>
>>> Thanks Sean! That was a simple fix. I changed it to "Create or Replace
>>> Table" but now I am getting the following error. I am still researching
>>> solutions but so far no luck.
>>>
>>> ParseException:
>>> mismatched input '' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 
>>> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
>>> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
>>> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
>>> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
>>> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
>>> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
>>> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
>>> DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
>>> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
>>> 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 
>>> 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 
>>> 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 
>>> 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 
>>> 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 
>>> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
>>> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
>>> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
>>> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
>>> 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', 
>>> NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 
>>> 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 
>>> 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 
>>> 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 
>>> 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 
>>> 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 
>>> 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
>>> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 
>>> 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
>>> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
>>> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
>>> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
>>> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
>>> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
>>> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
>>> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', IDENTIFIER, 
>>> BACKQUOTED_IDENTIFIER}(line 1, pos 23)
>>>
>>> == SQL ==
>>> CREATE OR REPLACE TABLE
>>>
>>>
>>> On Mon, Aug 1, 2022 at 8:32 PM Sean Owen  wrote:
>>>
 Pretty much what it says? you are creating a table over a path that
 already has data in it. You can't do that without mode=overwrite at least,
 if that's what you intend.

 On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga  wrote:

>
>
>- Component: Spark Delta, Spark SQL
>- Level: Beginner
>- Scenario: Debug, How-to
>
> *Python in Jupyter:*
>
> import pyspark
> import pyspark.sql.functions
>
> from pyspark.sql import SparkSession
> spark = (
> SparkSession
> .builder
> .appName("programming")
> .master("local")
> .config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0")
> .config("spark.sql.extensions", 
> "io.delta.sql.DeltaSparkSessionExtension")
> .config("spark.sql.catalog.spark_catalog", 
> "org.apache.spark.sql.delta.catalog.DeltaCatalog")
> .config('spark.ui.port', '4050')
> .getOrCreate()
>
> )
> from delta import *
>
> string_20210609 = '''worked_date,worker_id,delete_flag,hours_worked
> 2021-06-09,1001,Y,7
> 2021-06-09,1002,Y,3.75
> 2021-06-09,1003,Y,7.5
> 2021-06-09,1004,Y,6.25'''
>

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Stelios Philippou
HI Kumba.

SQL Structure is a bit different for
CREATE OR REPLACE TABLE


You can only do the following
CREATE TABLE IF NOT EXISTS



https://spark.apache.org/docs/3.3.0/sql-ref-syntax-ddl-create-table-datasource.html

On Tue, 2 Aug 2022 at 14:38, Sean Owen  wrote:

> I don't think "CREATE OR REPLACE TABLE" exists (in SQL?); this isn't a
> VIEW.
> Delete the path first; that's simplest.
>
> On Tue, Aug 2, 2022 at 12:55 AM Kumba Janga  wrote:
>
>> Thanks Sean! That was a simple fix. I changed it to "Create or Replace
>> Table" but now I am getting the following error. I am still researching
>> solutions but so far no luck.
>>
>> ParseException:
>> mismatched input '' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 
>> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
>> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
>> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
>> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
>> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
>> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
>> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
>> DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
>> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
>> 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 
>> 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 
>> 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 
>> 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 
>> 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 
>> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
>> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
>> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
>> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
>> 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 
>> 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 
>> 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 
>> 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 
>> 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 
>> 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 
>> 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 
>> 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 
>> 'ROWS', 'SCHEMA', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 
>> 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 
>> 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 
>> 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 
>> 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 
>> 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 
>> 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 
>> 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 
>> 'WHEN', 'WHERE', 'WINDOW', 'WITH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 
>> 1, pos 23)
>>
>> == SQL ==
>> CREATE OR REPLACE TABLE
>>
>>
>> On Mon, Aug 1, 2022 at 8:32 PM Sean Owen  wrote:
>>
>>> Pretty much what it says? you are creating a table over a path that
>>> already has data in it. You can't do that without mode=overwrite at least,
>>> if that's what you intend.
>>>
>>> On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga  wrote:
>>>


- Component: Spark Delta, Spark SQL
- Level: Beginner
- Scenario: Debug, How-to

 *Python in Jupyter:*

 import pyspark
 import pyspark.sql.functions

 from pyspark.sql import SparkSession
 spark = (
 SparkSession
 .builder
 .appName("programming")
 .master("local")
 .config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0")
 .config("spark.sql.extensions", 
 "io.delta.sql.DeltaSparkSessionExtension")
 .config("spark.sql.catalog.spark_catalog", 
 "org.apache.spark.sql.delta.catalog.DeltaCatalog")
 .config('spark.ui.port', '4050')
 .getOrCreate()

 )
 from delta import *

 string_20210609 = '''worked_date,worker_id,delete_flag,hours_worked
 2021-06-09,1001,Y,7
 2021-06-09,1002,Y,3.75
 2021-06-09,1003,Y,7.5
 2021-06-09,1004,Y,6.25'''

 rdd_20210609 = spark.sparkContext.parallelize(string_20210609.split('\n'))

 # FILES WILL SHOW UP ON THE LEFT UNDER THE FOLDER ICON IF YOU WANT TO 
 BROWSE THEM
 OUTPUT_DELTA_PATH = './output/delta/'

 spark.sql('CREATE DATABASE IF 

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
I don't think "CREATE OR REPLACE TABLE" exists (in SQL?); this isn't a VIEW.
Delete the path first; that's simplest.

On Tue, Aug 2, 2022 at 12:55 AM Kumba Janga  wrote:

> Thanks Sean! That was a simple fix. I changed it to "Create or Replace
> Table" but now I am getting the following error. I am still researching
> solutions but so far no luck.
>
> ParseException:
> mismatched input '' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 
> 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 
> 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 
> 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 
> 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 
> 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 
> 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 
> 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 
> 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 
> 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 
> 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 
> 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 
> 'ROWS', 'SCHEMA', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 
> 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 
> 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 
> 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 
> 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 
> 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 
> 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 
> 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 
> 'WINDOW', 'WITH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 23)
>
> == SQL ==
> CREATE OR REPLACE TABLE
>
>
> On Mon, Aug 1, 2022 at 8:32 PM Sean Owen  wrote:
>
>> Pretty much what it says? you are creating a table over a path that
>> already has data in it. You can't do that without mode=overwrite at least,
>> if that's what you intend.
>>
>> On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga  wrote:
>>
>>>
>>>
>>>- Component: Spark Delta, Spark SQL
>>>- Level: Beginner
>>>- Scenario: Debug, How-to
>>>
>>> *Python in Jupyter:*
>>>
>>> import pyspark
>>> import pyspark.sql.functions
>>>
>>> from pyspark.sql import SparkSession
>>> spark = (
>>> SparkSession
>>> .builder
>>> .appName("programming")
>>> .master("local")
>>> .config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0")
>>> .config("spark.sql.extensions", 
>>> "io.delta.sql.DeltaSparkSessionExtension")
>>> .config("spark.sql.catalog.spark_catalog", 
>>> "org.apache.spark.sql.delta.catalog.DeltaCatalog")
>>> .config('spark.ui.port', '4050')
>>> .getOrCreate()
>>>
>>> )
>>> from delta import *
>>>
>>> string_20210609 = '''worked_date,worker_id,delete_flag,hours_worked
>>> 2021-06-09,1001,Y,7
>>> 2021-06-09,1002,Y,3.75
>>> 2021-06-09,1003,Y,7.5
>>> 2021-06-09,1004,Y,6.25'''
>>>
>>> rdd_20210609 = spark.sparkContext.parallelize(string_20210609.split('\n'))
>>>
>>> # FILES WILL SHOW UP ON THE LEFT UNDER THE FOLDER ICON IF YOU WANT TO 
>>> BROWSE THEM
>>> OUTPUT_DELTA_PATH = './output/delta/'
>>>
>>> spark.sql('CREATE DATABASE IF NOT EXISTS EXERCISE')
>>>
>>> spark.sql('''
>>> CREATE TABLE IF NOT EXISTS EXERCISE.WORKED_HOURS(
>>> worked_date date
>>> , worker_id int
>>> , delete_flag string
>>> , hours_worked double
>>> ) USING DELTA
>>>
>>>
>>> PARTITIONED BY (worked_date)
>>> LOCATION "{0}"
>>> '''.format(OUTPUT_DELTA_PATH)
>>> )
>>>
>>> *Error 

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread ayan guha
Hi

I strongly suggest to use print prepared sqls and try them in raw form. The
error you posted points to a syntax error.

On Tue, 2 Aug 2022 at 3:56 pm, Kumba Janga  wrote:

> Thanks Sean! That was a simple fix. I changed it to "Create or Replace
> Table" but now I am getting the following error. I am still researching
> solutions but so far no luck.
>
> ParseException:
> mismatched input '' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 
> 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 
> 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 
> 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 
> 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 
> 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 
> 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 
> 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 
> 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 
> 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 
> 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 
> 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 
> 'ROWS', 'SCHEMA', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 
> 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 
> 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 
> 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 
> 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 
> 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 
> 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 
> 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 
> 'WINDOW', 'WITH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 23)
>
> == SQL ==
> CREATE OR REPLACE TABLE
>
>
> On Mon, Aug 1, 2022 at 8:32 PM Sean Owen  wrote:
>
>> Pretty much what it says? you are creating a table over a path that
>> already has data in it. You can't do that without mode=overwrite at least,
>> if that's what you intend.
>>
>> On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga  wrote:
>>
>>>
>>>
>>>- Component: Spark Delta, Spark SQL
>>>- Level: Beginner
>>>- Scenario: Debug, How-to
>>>
>>> *Python in Jupyter:*
>>>
>>> import pyspark
>>> import pyspark.sql.functions
>>>
>>> from pyspark.sql import SparkSession
>>> spark = (
>>> SparkSession
>>> .builder
>>> .appName("programming")
>>> .master("local")
>>> .config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0")
>>> .config("spark.sql.extensions", 
>>> "io.delta.sql.DeltaSparkSessionExtension")
>>> .config("spark.sql.catalog.spark_catalog", 
>>> "org.apache.spark.sql.delta.catalog.DeltaCatalog")
>>> .config('spark.ui.port', '4050')
>>> .getOrCreate()
>>>
>>> )
>>> from delta import *
>>>
>>> string_20210609 = '''worked_date,worker_id,delete_flag,hours_worked
>>> 2021-06-09,1001,Y,7
>>> 2021-06-09,1002,Y,3.75
>>> 2021-06-09,1003,Y,7.5
>>> 2021-06-09,1004,Y,6.25'''
>>>
>>> rdd_20210609 = spark.sparkContext.parallelize(string_20210609.split('\n'))
>>>
>>> # FILES WILL SHOW UP ON THE LEFT UNDER THE FOLDER ICON IF YOU WANT TO 
>>> BROWSE THEM
>>> OUTPUT_DELTA_PATH = './output/delta/'
>>>
>>> spark.sql('CREATE DATABASE IF NOT EXISTS EXERCISE')
>>>
>>> spark.sql('''
>>> CREATE TABLE IF NOT EXISTS EXERCISE.WORKED_HOURS(
>>> worked_date date
>>> , worker_id int
>>> , delete_flag string
>>> , hours_worked double
>>> ) USING DELTA
>>>
>>>
>>> PARTITIONED BY (worked_date)
>>> LOCATION "{0}"
>>> '''.format(OUTPUT_DELTA_PATH)
>>> )
>>>
>>> 

log transfering into hadoop/spark

2022-08-02 Thread pengyh

since flume is not continued to develop.
what's the current opensource tool to transfer webserver logs into
hdfs/spark?

thank you.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org