Hi guys,

I'm still trying to solve the issue with saving Hibernate entities from
Spark. After several attempts to redesign my own code I ended up with
HelloWorld example which clearly demonstrates that it's not the problem in
complexity of my code and session mixing in threads.

The code given bellow creates one simple hibernate entity and tries to save
it. If number of Spark partitions is more than one, line *println("Saved
tag:"+newTag)* is never executed. If I have only one partition, everything
works fine.

I would appreciate very much if somebody could explain what is the problem
in this code.


 val sc = SparkContextLoader.getSC
>     val scalaUsersIds = Seq[Long](2, 8, 15, 14, 6, 17, 21, 34, 75, 128,
> 304)
>     val usersRDD: RDD[Long] = sc.parallelize(scalaUsersIds)
>     usersRDD.foreachPartition {
>       val session: Session =
> HibernateUtil.getSessionFactory().openSession()
>       users => {
>         users.foreach {
>           userid =>
>             {
>               val newTag: Tag = new Tag
>               newTag.setTitle("title" + userid)
>               try {
>                 val isActive: Boolean = session.getTransaction().isActive()
>                 if (!isActive) {
>                   session.beginTransaction()
>                 }
>                 println("Saving tag:"+newTag)
>                 session.save(newTag)
>                 println("Saved tag:"+newTag)
>                 session.getTransaction().commit()
>               } catch {
>                 case ex: Exception => {
>                   if (session.getTransaction() != null) {
>                     session.getTransaction().rollback()
>                     ex.printStackTrace()
>                   }
>                 }
>               }
>             }
>         }
>       }
>       session.close()
>     }
>

Thanks,
Zoran


On Sun, Sep 6, 2015 at 1:42 PM, Zoran Jeremic <zoran.jere...@gmail.com>
wrote:

> I have GenericDAO class which is initialized for each partition. This
> class uses SessionFactory.openSession() to open a new session in it's
> constructor. As per my understanding, this means that each partition have
> different session, but they are using the same SessionFactory to open it.
>
> why not create the session at the start of the saveInBatch method and
>> close it at the end
>>
> This won't work for me, or at least I think it won't. At the beginning of
> the process I load some entities (e.g. User, UserPreference...) from
> hibernate and then I use it across the process, even after I perform
> saveInBatch. It needs to be in session in order to pull data that I need
> and update it later, so I can't open another session inside the existing
> one.
>
> On Sun, Sep 6, 2015 at 1:40 AM, Matthew Johnson <matt.john...@algomi.com>
> wrote:
>
>> I agree with Igor - I would either make sure session is ThreadLocal or,
>> more simply, why not create the session at the start of the saveInBatch
>> method and close it at the end? Creating a SessionFactory is an expensive
>> operation but creating a Session is a relatively cheap one.
>> On 6 Sep 2015 07:27, "Igor Berman" <igor.ber...@gmail.com> wrote:
>>
>>> how do you create your session? do you reuse it across threads? how do
>>> you create/close session manager?
>>> look for the problem in session creation, probably something deadlocked,
>>> as far as I remember hib.session should be created per thread
>>>
>>> On 6 September 2015 at 07:11, Zoran Jeremic <zoran.jere...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm developing long running process that should find RSS feeds that all
>>>> users in the system have registered to follow, parse these RSS feeds,
>>>> extract new entries and store it back to the database as Hibernate
>>>> entities, so user can retrieve it. I want to use Apache Spark to enable
>>>> parallel processing, since this process might take several hours depending
>>>> on the number of users.
>>>>
>>>> The approach I thought should work was to use
>>>> *useridsRDD.foreachPartition*, so I can have separate hibernate
>>>> session for each partition. I created Database session manager that is
>>>> initialized for each partition which keeps hibernate session alive until
>>>> the process is over.
>>>>
>>>> Once all RSS feeds from one source are parsed and Feed entities are
>>>> created, I'm sending the whole list to Database Manager method that saves
>>>> the whole list in batch:
>>>>
>>>>> public  <T extends BaseEntity> void saveInBatch(List<T> entities) {
>>>>>     try{
>>>>>       boolean isActive = session.getTransaction().isActive();
>>>>>         if ( !isActive) {
>>>>>             session.beginTransaction();
>>>>>         }
>>>>>        for(Object entity:entities){
>>>>>          session.save(entity);
>>>>>         }
>>>>>        session.getTransaction().commit();
>>>>>      }catch(Exception ex){
>>>>>     if(session.getTransaction()!=null) {
>>>>>         session.getTransaction().rollback();
>>>>>         ex.printStackTrace();
>>>>>    }
>>>>>   }
>>>>>
>>>>> However, this works only if I have one Spark partition. If there are
>>>> two or more partitions, the whole process is blocked once I try to save the
>>>> first entity. In order to make the things simpler, I tried to simplify Feed
>>>> entity, so it doesn't refer and is not referred from any other entity. It
>>>> also doesn't have any collection.
>>>>
>>>> I hope that some of you have already tried something similar and could
>>>> give me idea how to solve this problem
>>>>
>>>> Thanks,
>>>> Zoran
>>>>
>>>>
>>>
>
>
> --
>
> *******************************************************************************
> Zoran Jeremic, PhD
> Senior System Analyst & Programmer
>
> Athabasca University
> Tel: +1 604 92 89 944
> E-mail: zoran.jere...@gmail.com <zoran.jere...@va.mod.gov.rs>
> Homepage:  http://zoranjeremic.org
>
> **********************************************************************************
>



-- 
*******************************************************************************
Zoran Jeremic, PhD
Senior System Analyst & Programmer

Athabasca University
Tel: +1 604 92 89 944
E-mail: zoran.jere...@gmail.com <zoran.jere...@va.mod.gov.rs>
Homepage:  http://zoranjeremic.org
**********************************************************************************

Reply via email to