I think that most packages already default log.dir to something more reasonable.

On Mon, Jan 26, 2015 at 1:06 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> Having a relative path and keeping data under /data in the kafka distro
> would make sense. This would require some reworking of the shell scripts,
> though, as I think right now you an actually run Kafka from any directory
> and the cwd of the process will be whatever directory you start from. If we
> have a relative path in the config then the working directory will HAVE to
> be the kafka directory. This works for the simple download case but may
> making some packaging stuff harder for other use cases.
>
> -Jay
>
> On Mon, Jan 26, 2015 at 5:54 AM, Jaikiran Pai <jai.forums2...@gmail.com>
> wrote:
>
>> Having looked at the logs the user posted, I don't think this specific
>> issue has to do with /tmp path.
>>
>> However, now that the /tmp path is being discussed, I think it's a good
>> idea that we default the Kafka logs to a certain folder. As Jay notes, it
>> makes it very easy to just download and start the servers without having to
>> fiddle with the configs when you are just starting out. Having said that,
>> when I started out with Kafka, I found /tmp to be a odd place to default
>> the path to. I expected them to be defaulted to a folder within the Kafka
>> install. Somewhere like KAFKA_INSTALL_FOLDER/data/kafka-logs/ folder. Is
>> that something we should do?
>>
>> -Jaikiran
>>
>> On Monday 26 January 2015 12:23 AM, Jay Kreps wrote:
>>
>>> Hmm, but I don't think tmp gets cleaned while the server is running...
>>>
>>> The reason for using tmp was because we don't know which directory they
>>> will use and we don't want them to have to edit configuration for the
>>> simple "out of the box" getting started tutorial. I actually do think that
>>> is important. Maybe an intermediate step we could do is just call out this
>>> setting in the quickstart so people know where data is going and know they
>>> need to configure it later...
>>>
>>> -Jay
>>>
>>> On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein <joe.st...@stealth.ly> wrote:
>>>
>>>  This feels like another type of symptom from people using /tmp/ for their
>>>> logs.  Perosnally, I would rather use /mnt/data or something and if that
>>>> doesn't exist on their machine we can exception, or no default and force
>>>> set it.
>>>>
>>>> /*******************************************
>>>> Joe Stein
>>>> Founder, Principal Consultant
>>>> Big Data Open Source Security LLC
>>>> http://www.stealth.ly
>>>> Twitter: @allthingshadoop
>>>> ********************************************/
>>>> On Jan 25, 2015 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>>>>
>>>>  I think you are right, good catch. It could be that this user deleted
>>>>> the
>>>>> files manually, but I wonder if there isn't some way that is a Kafka
>>>>> bug--e.g. if multiple types of retention policies kick in at the same
>>>>>
>>>> time
>>>>
>>>>> do we synchronize that properly?
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai <jai.forums2...@gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>  Hi Jay,
>>>>>>
>>>>>> I spent some more time over this today and went back to the original
>>>>>> thread which brought up the issue with file leaks [1]. I think that
>>>>>>
>>>>> output
>>>>>
>>>>>> of lsof in that logs has a very important hint:
>>>>>>
>>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
>>>>>> topic_ypgsearch_yellowpageV2-0/00000000000068818668.log (deleted) java
>>>>>> 8446 root 725u REG 253,2 536910838 26087364
>>>>>>
>>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
>>>>>> topic_ypgsearch_yellowpageV2-0/00000000000069457098.log (deleted) java
>>>>>> 8446 root 726u REG 253,2 536917902 26087368
>>>>>>
>>>>>> Notice the "(deleted)" text in that output. The last time I looked at
>>>>>>
>>>>> that
>>>>>
>>>>>> output, I thought it was the user who had added that "deleted" text to
>>>>>>
>>>>> help
>>>>>
>>>>>> us understand that problem. But today I read up on the output format of
>>>>>> lsof and it turns out that it's lsof which itself adds that hint
>>>>>>
>>>>> whenever a
>>>>>
>>>>>> file has already been deleted possibly by a different process but some
>>>>>> other process is still holding on to open resources of that (deleted)
>>>>>>
>>>>> file
>>>>>
>>>>>> [2].
>>>>>>
>>>>>> So in the context of the issue that we are discussing and the way Kafka
>>>>>> deals with async deletes (i.e. first by attempting a rename of the
>>>>>> log/index files), I think this all makes sense now. So what I think is
>>>>>> happening is, some (other?) process (not sure what/why) has already
>>>>>>
>>>>> deleted
>>>>>
>>>>>> the log file that Kafka is using for the LogSegment. The LogSegment
>>>>>>
>>>>> however
>>>>>
>>>>>> still has open FileChannel resource on that deleted file (and that's
>>>>>>
>>>>> why
>>>>
>>>>> the open file descriptor is held on and shows up in that output). Now
>>>>>> Kafka, at some point in time, triggers an async delete of the
>>>>>>
>>>>> LogSegment,
>>>>
>>>>> which involves a file rename of that (already deleted) log file. The
>>>>>>
>>>>> rename
>>>>>
>>>>>> fails (because the original file path isn't there anymore). As a
>>>>>>
>>>>> result,
>>>>
>>>>> we
>>>>>
>>>>>> end up throwing that "failed to rename, KafkaStorageException" and thus
>>>>>> leave behind the open FileChannel to continue being open forever (till
>>>>>>
>>>>> the
>>>>>
>>>>>> Kafka program exits).
>>>>>>
>>>>>> So I think we should:
>>>>>>
>>>>>> 1) Find what/why deletes that underlying log file(s). I'll add a reply
>>>>>>
>>>>> to
>>>>
>>>>> that original mail discussion asking the user if he can provide more
>>>>>> details.
>>>>>> 2) Handle this case and close the FileChannel. The patch that's been
>>>>>> uploaded to review board https://reviews.apache.org/r/29755/ does
>>>>>>
>>>>> that.
>>>>
>>>>> The "immediate delete" on failure to rename, involves (safely) closing
>>>>>>
>>>>> the
>>>>>
>>>>>> open FileChannel and (safely) deleting the (possibly non-existent)
>>>>>>
>>>>> file.
>>>>
>>>>> By the way, this entire thing can be easily reproduced by running the
>>>>>> following program which first creates a file and open a filechannel to
>>>>>>
>>>>> that
>>>>>
>>>>>> file and then waits for the user to delete that file externally (I used
>>>>>>
>>>>> the
>>>>>
>>>>>> rm command) and then go and tries to rename that deleted file, which
>>>>>>
>>>>> then
>>>>
>>>>> fails. In between each of these steps, you can run the lsof command
>>>>>> externally to see the open file resources (I used 'lsof | grep
>>>>>>
>>>>> test.log'):
>>>>>
>>>>>>      public static void main(String[] args) throws Exception {
>>>>>>          // Open a file and file channel for read/write
>>>>>>          final File originalLogFile = new
>>>>>>
>>>>> File("/home/jaikiran/deleteme/test.log");
>>>>>
>>>>>> // change this path relevantly if you plan to run it
>>>>>>          final FileChannel fileChannel = new
>>>>>>
>>>>> RandomAccessFile(originalLogFile,
>>>>>
>>>>>> "rw").getChannel();
>>>>>>          System.out.println("Opened file channel to " +
>>>>>>
>>>>> originalLogFile);
>>>>
>>>>>          // wait for underlying file to be deleted externally
>>>>>>          System.out.println("Waiting for the " + originalLogFile + " to
>>>>>>
>>>>> be
>>>>
>>>>> deleted externally. Press any key after the file is deleted");
>>>>>>          System.in.read();
>>>>>>          // wait for the user to check the lsof output
>>>>>>          System.out.println(originalLogFile + " seems to have been
>>>>>>
>>>>> deleted
>>>>
>>>>> externally, check lsof command output to see open file resources.");
>>>>>>          System.out.println("Press any key to try renaming this already
>>>>>> deleted file, from the program");
>>>>>>          System.in.read();
>>>>>>          // try rename
>>>>>>          final File fileToRenameTo = new File(originalLogFile.getPath()
>>>>>>
>>>>> +
>>>>
>>>>> ".deleted");
>>>>>>          System.out.println("Trying to rename " + originalLogFile + "
>>>>>>
>>>>> to "
>>>>
>>>>> + fileToRenameTo);
>>>>>>          final boolean renamedSucceeded = originalLogFile.renameTo(
>>>>>> fileToRenameTo);
>>>>>>          if (renamedSucceeded) {
>>>>>>              System.out.println("Rename SUCCEEDED. Renamed file exists?
>>>>>>
>>>>> "
>>>>
>>>>> +
>>>>>
>>>>>> fileToRenameTo.exists());
>>>>>>          } else {
>>>>>>              System.out.println("FAILED to rename file " +
>>>>>>
>>>>> originalLogFile
>>>>
>>>>> + " to " + fileToRenameTo);
>>>>>>          }
>>>>>>          // wait for the user to check the lsof output, after our
>>>>>> rename
>>>>>> failed
>>>>>>          System.out.println("Check the lsof output and press any key to
>>>>>> close the open file channel to a deleted file");
>>>>>>          System.in.read();
>>>>>>          // close the file channel
>>>>>>          fileChannel.close();
>>>>>>          // let user check the lsof output one final time. This time he
>>>>>> won't see open file resources from this program
>>>>>>          System.out.println("File channel closed. Check the lsof output
>>>>>>
>>>>> and
>>>>>
>>>>>> press any key to terminate the program");
>>>>>>          System.in.read();
>>>>>>          // all done, exit
>>>>>>          System.out.println("Program will terminate");
>>>>>>      }
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1] http://mail-archives.apache.org/mod_mbox/kafka-users/
>>>>>> 201501.mbox/%3CCAA4R6b-7gSbPp5_ebGpwYyNibDAwE_%2BwoE%
>>>>>> 2BKbiMuU27-2j%2BLkg%40mail.gmail.com%3E
>>>>>> [2] http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8
>>>>>>
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>> On Saturday 24 January 2015 11:12 PM, Jay Kreps wrote:
>>>>>>
>>>>>>  Hey guys,
>>>>>>>
>>>>>>> Jaikiran posted a patch on KAFKA-1853 to improve the handling of
>>>>>>>
>>>>>> failures
>>>>>
>>>>>> during delete.
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1853
>>>>>>>
>>>>>>> The core problem here is that we are doing File.rename() as part of
>>>>>>>
>>>>>> the
>>>>
>>>>> delete sequence which returns false if the rename failed. Or file
>>>>>>>
>>>>>> delete
>>>>
>>>>> sequence is something like the following:
>>>>>>> 1. Remove the file from the index so no new reads can begin on it
>>>>>>> 2. Rename the file to xyz.deleted so that if we crash it will get
>>>>>>>
>>>>>> cleaned
>>>>>
>>>>>> up
>>>>>>> 3. Schedule a task to delete the file in 30 seconds or so when any
>>>>>>> in-progress reads have likely completed. The goal here is to avoid
>>>>>>>
>>>>>> errors
>>>>>
>>>>>> on in progress reads but also avoid locking on all reads.
>>>>>>>
>>>>>>> The question is what to do when rename fails? Previously if this
>>>>>>>
>>>>>> happened
>>>>>
>>>>>> we actually didn't pay attention and would fail to delete the file
>>>>>>> entirely. This patch changes it so that if the rename fails we log an
>>>>>>>
>>>>>> error
>>>>>
>>>>>> and force an immediate delete.
>>>>>>>
>>>>>>> I think this is the right thing to do, but I guess the real question
>>>>>>>
>>>>>> is
>>>>
>>>>> why would rename fail? Some possibilities:
>>>>>>> http://stackoverflow.com/questions/2372374/why-would-a-
>>>>>>> file-rename-fail-in-java
>>>>>>>
>>>>>>> An alternative would be to treat this as a filesystem error and
>>>>>>>
>>>>>> shutdown
>>>>
>>>>> as we do elsewhere.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> -Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>

Reply via email to