Oliver Drewes created ZEPPELIN-483:
--------------------------------------
Summary: Cronjob: Infinity interpreting notes -> Infinity new
files & inodes
Key: ZEPPELIN-483
URL: https://issues.apache.org/jira/browse/ZEPPELIN-483
Project: Zeppelin
Issue Type: Bug
Components: Core
Affects Versions: 0.6.0
Environment: Build for Spark 1.4.1 / mapr5
Reporter: Oliver Drewes
Priority: Blocker
Lets start with the basic:
Zeppelin will always write to the tmp folder. Whatever you enter for
SparkInterpter settings, Zeppelin keeps writing his compiled spark source to
/tmp/spark-{ID}
No ENV-variable will change this behaviour.
This means it takes inodes from each file it creates by interpreting the single
lines of code. This wouldnt matter if you run them once. But it do run it
regularly or in a cronjob, each line of your note is interpreted again and
again. So 30 lines of code produce about 200 files. If you run the cronjob once
a minute, it produces about 12000 Files an hour. Interpreting the code line by
line, without checking if it already exists is a bad solution.
For a 1 GB Filesystem f.e. you have 65k inodes available. This means if you run
your source for some house, it need 100 MB of space but produces 65k files and
you run out of inodes.
My idea of an solution would be to check if the note has changed. If it has
changed, delete the old class files and run it again.
If it is the same, reuse the existing classes.
If a class if the same hash exists already, reuse this class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)