subject:"Error with Spark \+ IGFS \(HDFS cache\) through Hive"

Re: Error with Spark + IGFS (HDFS cache) through Hive

2019-01-14 Thread ilya.kasnacheev

Hello!

If you can write a test which demonstrates problematic behavior, and a fix
on top of that, and fill a ticket against IGNITE project in Apache JIRA, I
think we could merge it all right.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Error with Spark + IGFS (HDFS cache) through Hive

2018-09-13 Thread Maximiliano Patricio Méndez

Hi evgenii, thanks for answering.

I'm fairly new to ignite, but from what I can see in the docs the ignite
rdd or the ignite dataframe are only used to serve data that is already in
Ignite in the form of a Table or cache. Unfortunately I currently have a
couple hundred tables in HDFS referenced through the Hive metastore which
conveniently enables me to reference those tables in spark with minimal
configuration on the user side. (spark.sql SELECT FROM any table in the
hive metastore).

What I was hoping to accomplish was to use Ignite (IGFS) as a HDFS cache
with HDFS as its SecondaryFileSystem, referencing those table through the
hive metastore to use in spark. I haven't tested the performance yet, but I
feel that it most certainly will be a worse solution than using the native
ignite tables.

I finally compiled a different version of the ignite-hadoop module
overwriting the LOG variable which causes the problem for a log of the
v1.IgniteHadoopFileSystem.class, making the LOG passed through the
HadoopIgfsWrapper and beyond to be a log of the v1.IgniteHadoopFileSystem,
instead of a log of the hadoop.FileSystem class which caused the
LinkageError. This seems to have solved the issue so far, and it's the same
logging solution present in the v2.IgniteHadoopFileSystem.

My questions would be,

Do you think this fix is worth a small patch or is this kind of usage
completely unadvised?
The philosophy behind ignite to be used through spark would be to always
create native ignite tables?

Thanks in advance.

El mar., 11 sept. 2018 a las 10:25, Evgenii Zhuravlev (<
e.zhuravlev...@gmail.com>) escribió:

> Hi,
>
> Do you really need to use Hive here? You can just use Spark integration
> with Ignite, which allows to run sql: DataFrame(
> https://apacheignite-fs.readme.io/docs/ignite-data-frame) or RDD(
> https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd). For
> sure, this solution will work much faster.
>
> Evgenii
>
> пн, 10 сент. 2018 г. в 23:08, Maximiliano Patricio Méndez <
> mmen...@despegar.com>:
>
>> Hi,
>>
>> I'm having an LinkageError in spark trying to read a hive table that has
>> the external location in IGFS:
>> java.lang.LinkageError: loader constraint violation: when resolving field
>> "LOG" the class loader (instance of
>> org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the
>> referring class, org/apache/hadoop/fs/FileSystem, and the class loader
>> (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved
>> type, org/apache/commons/logging/Log, have different Class objects for that
>> type
>>   at
>> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255)
>>
>> From what I can see the exception comes when spark tries to read a table
>> from Hive and then through IGFS and passing the "LOG" variable of the
>> FileSystem around to the HadoopIgfsWrapper (and beyond...).
>>
>> The steps I followed to reach this error were:
>>
>>- Create a file /tmp/test.parquet in HDFS
>>- Create an external table test.test in hive with location =
>>igfs://igfs@/tmp/test.parquet
>>- Start spark-shell with the command:
>>   - ./bin/spark-shell --jars
>>   
>> $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar
>>   - Read the table through spark.sql
>>   - spark.sql("SELECT * FROM test.test")
>>
>> Is there maybe a way to avoid having this issue? Has anyone used ignite
>> through hive as HDFS cache in a similar way?
>>
>>
>>

Re: Error with Spark + IGFS (HDFS cache) through Hive

2018-09-11 Thread Evgenii Zhuravlev

Hi,

Do you really need to use Hive here? You can just use Spark integration
with Ignite, which allows to run sql: DataFrame(
https://apacheignite-fs.readme.io/docs/ignite-data-frame) or RDD(
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd). For sure,
this solution will work much faster.

Evgenii

пн, 10 сент. 2018 г. в 23:08, Maximiliano Patricio Méndez <
mmen...@despegar.com>:

> Hi,
>
> I'm having an LinkageError in spark trying to read a hive table that has
> the external location in IGFS:
> java.lang.LinkageError: loader constraint violation: when resolving field
> "LOG" the class loader (instance of
> org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the
> referring class, org/apache/hadoop/fs/FileSystem, and the class loader
> (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved
> type, org/apache/commons/logging/Log, have different Class objects for that
> type
>   at
> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255)
>
> From what I can see the exception comes when spark tries to read a table
> from Hive and then through IGFS and passing the "LOG" variable of the
> FileSystem around to the HadoopIgfsWrapper (and beyond...).
>
> The steps I followed to reach this error were:
>
>- Create a file /tmp/test.parquet in HDFS
>- Create an external table test.test in hive with location =
>igfs://igfs@/tmp/test.parquet
>- Start spark-shell with the command:
>   - ./bin/spark-shell --jars
>   
> $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar
>   - Read the table through spark.sql
>   - spark.sql("SELECT * FROM test.test")
>
> Is there maybe a way to avoid having this issue? Has anyone used ignite
> through hive as HDFS cache in a similar way?
>
>
>

Error with Spark + IGFS (HDFS cache) through Hive

2018-09-10 Thread Maximiliano Patricio Méndez

Hi,

I'm having an LinkageError in spark trying to read a hive table that has
the external location in IGFS:
java.lang.LinkageError: loader constraint violation: when resolving field
"LOG" the class loader (instance of
org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the
referring class, org/apache/hadoop/fs/FileSystem, and the class loader
(instance of sun/misc/Launcher$AppClassLoader) for the field's resolved
type, org/apache/commons/logging/Log, have different Class objects for that
type
  at
org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255)

>From what I can see the exception comes when spark tries to read a table
from Hive and then through IGFS and passing the "LOG" variable of the
FileSystem around to the HadoopIgfsWrapper (and beyond...).

The steps I followed to reach this error were:

   - Create a file /tmp/test.parquet in HDFS
   - Create an external table test.test in hive with location = igfs://igfs@
   /tmp/test.parquet
   - Start spark-shell with the command:
  - ./bin/spark-shell --jars
  
$IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar
  - Read the table through spark.sql
  - spark.sql("SELECT * FROM test.test")

Is there maybe a way to avoid having this issue? Has anyone used ignite
through hive as HDFS cache in a similar way?

Re: Error with Spark + IGFS (HDFS cache) through Hive

Re: Error with Spark + IGFS (HDFS cache) through Hive

Re: Error with Spark + IGFS (HDFS cache) through Hive

Error with Spark + IGFS (HDFS cache) through Hive

4 matches

Site Navigation

Mail list logo

Footer information