Re: Error with Spark + IGFS (HDFS cache) through Hive
Hello! If you can write a test which demonstrates problematic behavior, and a fix on top of that, and fill a ticket against IGNITE project in Apache JIRA, I think we could merge it all right. Regards, -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Error with Spark + IGFS (HDFS cache) through Hive
Hi evgenii, thanks for answering. I'm fairly new to ignite, but from what I can see in the docs the ignite rdd or the ignite dataframe are only used to serve data that is already in Ignite in the form of a Table or cache. Unfortunately I currently have a couple hundred tables in HDFS referenced through the Hive metastore which conveniently enables me to reference those tables in spark with minimal configuration on the user side. (spark.sql SELECT FROM any table in the hive metastore). What I was hoping to accomplish was to use Ignite (IGFS) as a HDFS cache with HDFS as its SecondaryFileSystem, referencing those table through the hive metastore to use in spark. I haven't tested the performance yet, but I feel that it most certainly will be a worse solution than using the native ignite tables. I finally compiled a different version of the ignite-hadoop module overwriting the LOG variable which causes the problem for a log of the v1.IgniteHadoopFileSystem.class, making the LOG passed through the HadoopIgfsWrapper and beyond to be a log of the v1.IgniteHadoopFileSystem, instead of a log of the hadoop.FileSystem class which caused the LinkageError. This seems to have solved the issue so far, and it's the same logging solution present in the v2.IgniteHadoopFileSystem. My questions would be, Do you think this fix is worth a small patch or is this kind of usage completely unadvised? The philosophy behind ignite to be used through spark would be to always create native ignite tables? Thanks in advance. El mar., 11 sept. 2018 a las 10:25, Evgenii Zhuravlev (< e.zhuravlev...@gmail.com>) escribió: > Hi, > > Do you really need to use Hive here? You can just use Spark integration > with Ignite, which allows to run sql: DataFrame( > https://apacheignite-fs.readme.io/docs/ignite-data-frame) or RDD( > https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd). For > sure, this solution will work much faster. > > Evgenii > > пн, 10 сент. 2018 г. в 23:08, Maximiliano Patricio Méndez < > mmen...@despegar.com>: > >> Hi, >> >> I'm having an LinkageError in spark trying to read a hive table that has >> the external location in IGFS: >> java.lang.LinkageError: loader constraint violation: when resolving field >> "LOG" the class loader (instance of >> org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the >> referring class, org/apache/hadoop/fs/FileSystem, and the class loader >> (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved >> type, org/apache/commons/logging/Log, have different Class objects for that >> type >> at >> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255) >> >> From what I can see the exception comes when spark tries to read a table >> from Hive and then through IGFS and passing the "LOG" variable of the >> FileSystem around to the HadoopIgfsWrapper (and beyond...). >> >> The steps I followed to reach this error were: >> >>- Create a file /tmp/test.parquet in HDFS >>- Create an external table test.test in hive with location = >>igfs://igfs@/tmp/test.parquet >>- Start spark-shell with the command: >> - ./bin/spark-shell --jars >> >> $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar >> - Read the table through spark.sql >> - spark.sql("SELECT * FROM test.test") >> >> Is there maybe a way to avoid having this issue? Has anyone used ignite >> through hive as HDFS cache in a similar way? >> >> >>
Re: Error with Spark + IGFS (HDFS cache) through Hive
Hi, Do you really need to use Hive here? You can just use Spark integration with Ignite, which allows to run sql: DataFrame( https://apacheignite-fs.readme.io/docs/ignite-data-frame) or RDD( https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd). For sure, this solution will work much faster. Evgenii пн, 10 сент. 2018 г. в 23:08, Maximiliano Patricio Méndez < mmen...@despegar.com>: > Hi, > > I'm having an LinkageError in spark trying to read a hive table that has > the external location in IGFS: > java.lang.LinkageError: loader constraint violation: when resolving field > "LOG" the class loader (instance of > org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the > referring class, org/apache/hadoop/fs/FileSystem, and the class loader > (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved > type, org/apache/commons/logging/Log, have different Class objects for that > type > at > org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255) > > From what I can see the exception comes when spark tries to read a table > from Hive and then through IGFS and passing the "LOG" variable of the > FileSystem around to the HadoopIgfsWrapper (and beyond...). > > The steps I followed to reach this error were: > >- Create a file /tmp/test.parquet in HDFS >- Create an external table test.test in hive with location = >igfs://igfs@/tmp/test.parquet >- Start spark-shell with the command: > - ./bin/spark-shell --jars > > $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar > - Read the table through spark.sql > - spark.sql("SELECT * FROM test.test") > > Is there maybe a way to avoid having this issue? Has anyone used ignite > through hive as HDFS cache in a similar way? > > >
Error with Spark + IGFS (HDFS cache) through Hive
Hi, I'm having an LinkageError in spark trying to read a hive table that has the external location in IGFS: java.lang.LinkageError: loader constraint violation: when resolving field "LOG" the class loader (instance of org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the referring class, org/apache/hadoop/fs/FileSystem, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved type, org/apache/commons/logging/Log, have different Class objects for that type at org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255) >From what I can see the exception comes when spark tries to read a table from Hive and then through IGFS and passing the "LOG" variable of the FileSystem around to the HadoopIgfsWrapper (and beyond...). The steps I followed to reach this error were: - Create a file /tmp/test.parquet in HDFS - Create an external table test.test in hive with location = igfs://igfs@ /tmp/test.parquet - Start spark-shell with the command: - ./bin/spark-shell --jars $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar - Read the table through spark.sql - spark.sql("SELECT * FROM test.test") Is there maybe a way to avoid having this issue? Has anyone used ignite through hive as HDFS cache in a similar way?