Having a relative path and keeping data under /data in the kafka distro
would make sense. This would require some reworking of the shell scripts,
though, as I think right now you an actually run Kafka from any directory
and the cwd of the process will be whatever directory you start from. If we
have a relative path in the config then the working directory will HAVE to
be the kafka directory. This works for the simple download case but may
making some packaging stuff harder for other use cases.

-Jay

On Mon, Jan 26, 2015 at 5:54 AM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:

> Having looked at the logs the user posted, I don't think this specific
> issue has to do with /tmp path.
>
> However, now that the /tmp path is being discussed, I think it's a good
> idea that we default the Kafka logs to a certain folder. As Jay notes, it
> makes it very easy to just download and start the servers without having to
> fiddle with the configs when you are just starting out. Having said that,
> when I started out with Kafka, I found /tmp to be a odd place to default
> the path to. I expected them to be defaulted to a folder within the Kafka
> install. Somewhere like KAFKA_INSTALL_FOLDER/data/kafka-logs/ folder. Is
> that something we should do?
>
> -Jaikiran
>
> On Monday 26 January 2015 12:23 AM, Jay Kreps wrote:
>
>> Hmm, but I don't think tmp gets cleaned while the server is running...
>>
>> The reason for using tmp was because we don't know which directory they
>> will use and we don't want them to have to edit configuration for the
>> simple "out of the box" getting started tutorial. I actually do think that
>> is important. Maybe an intermediate step we could do is just call out this
>> setting in the quickstart so people know where data is going and know they
>> need to configure it later...
>>
>> -Jay
>>
>> On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein <joe.st...@stealth.ly> wrote:
>>
>>  This feels like another type of symptom from people using /tmp/ for their
>>> logs.  Perosnally, I would rather use /mnt/data or something and if that
>>> doesn't exist on their machine we can exception, or no default and force
>>> set it.
>>>
>>> /*******************************************
>>> Joe Stein
>>> Founder, Principal Consultant
>>> Big Data Open Source Security LLC
>>> http://www.stealth.ly
>>> Twitter: @allthingshadoop
>>> ********************************************/
>>> On Jan 25, 2015 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>>>
>>>  I think you are right, good catch. It could be that this user deleted
>>>> the
>>>> files manually, but I wonder if there isn't some way that is a Kafka
>>>> bug--e.g. if multiple types of retention policies kick in at the same
>>>>
>>> time
>>>
>>>> do we synchronize that properly?
>>>>
>>>> -Jay
>>>>
>>>> On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai <jai.forums2...@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>>  Hi Jay,
>>>>>
>>>>> I spent some more time over this today and went back to the original
>>>>> thread which brought up the issue with file leaks [1]. I think that
>>>>>
>>>> output
>>>>
>>>>> of lsof in that logs has a very important hint:
>>>>>
>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
>>>>> topic_ypgsearch_yellowpageV2-0/00000000000068818668.log (deleted) java
>>>>> 8446 root 725u REG 253,2 536910838 26087364
>>>>>
>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
>>>>> topic_ypgsearch_yellowpageV2-0/00000000000069457098.log (deleted) java
>>>>> 8446 root 726u REG 253,2 536917902 26087368
>>>>>
>>>>> Notice the "(deleted)" text in that output. The last time I looked at
>>>>>
>>>> that
>>>>
>>>>> output, I thought it was the user who had added that "deleted" text to
>>>>>
>>>> help
>>>>
>>>>> us understand that problem. But today I read up on the output format of
>>>>> lsof and it turns out that it's lsof which itself adds that hint
>>>>>
>>>> whenever a
>>>>
>>>>> file has already been deleted possibly by a different process but some
>>>>> other process is still holding on to open resources of that (deleted)
>>>>>
>>>> file
>>>>
>>>>> [2].
>>>>>
>>>>> So in the context of the issue that we are discussing and the way Kafka
>>>>> deals with async deletes (i.e. first by attempting a rename of the
>>>>> log/index files), I think this all makes sense now. So what I think is
>>>>> happening is, some (other?) process (not sure what/why) has already
>>>>>
>>>> deleted
>>>>
>>>>> the log file that Kafka is using for the LogSegment. The LogSegment
>>>>>
>>>> however
>>>>
>>>>> still has open FileChannel resource on that deleted file (and that's
>>>>>
>>>> why
>>>
>>>> the open file descriptor is held on and shows up in that output). Now
>>>>> Kafka, at some point in time, triggers an async delete of the
>>>>>
>>>> LogSegment,
>>>
>>>> which involves a file rename of that (already deleted) log file. The
>>>>>
>>>> rename
>>>>
>>>>> fails (because the original file path isn't there anymore). As a
>>>>>
>>>> result,
>>>
>>>> we
>>>>
>>>>> end up throwing that "failed to rename, KafkaStorageException" and thus
>>>>> leave behind the open FileChannel to continue being open forever (till
>>>>>
>>>> the
>>>>
>>>>> Kafka program exits).
>>>>>
>>>>> So I think we should:
>>>>>
>>>>> 1) Find what/why deletes that underlying log file(s). I'll add a reply
>>>>>
>>>> to
>>>
>>>> that original mail discussion asking the user if he can provide more
>>>>> details.
>>>>> 2) Handle this case and close the FileChannel. The patch that's been
>>>>> uploaded to review board https://reviews.apache.org/r/29755/ does
>>>>>
>>>> that.
>>>
>>>> The "immediate delete" on failure to rename, involves (safely) closing
>>>>>
>>>> the
>>>>
>>>>> open FileChannel and (safely) deleting the (possibly non-existent)
>>>>>
>>>> file.
>>>
>>>> By the way, this entire thing can be easily reproduced by running the
>>>>> following program which first creates a file and open a filechannel to
>>>>>
>>>> that
>>>>
>>>>> file and then waits for the user to delete that file externally (I used
>>>>>
>>>> the
>>>>
>>>>> rm command) and then go and tries to rename that deleted file, which
>>>>>
>>>> then
>>>
>>>> fails. In between each of these steps, you can run the lsof command
>>>>> externally to see the open file resources (I used 'lsof | grep
>>>>>
>>>> test.log'):
>>>>
>>>>>      public static void main(String[] args) throws Exception {
>>>>>          // Open a file and file channel for read/write
>>>>>          final File originalLogFile = new
>>>>>
>>>> File("/home/jaikiran/deleteme/test.log");
>>>>
>>>>> // change this path relevantly if you plan to run it
>>>>>          final FileChannel fileChannel = new
>>>>>
>>>> RandomAccessFile(originalLogFile,
>>>>
>>>>> "rw").getChannel();
>>>>>          System.out.println("Opened file channel to " +
>>>>>
>>>> originalLogFile);
>>>
>>>>          // wait for underlying file to be deleted externally
>>>>>          System.out.println("Waiting for the " + originalLogFile + " to
>>>>>
>>>> be
>>>
>>>> deleted externally. Press any key after the file is deleted");
>>>>>          System.in.read();
>>>>>          // wait for the user to check the lsof output
>>>>>          System.out.println(originalLogFile + " seems to have been
>>>>>
>>>> deleted
>>>
>>>> externally, check lsof command output to see open file resources.");
>>>>>          System.out.println("Press any key to try renaming this already
>>>>> deleted file, from the program");
>>>>>          System.in.read();
>>>>>          // try rename
>>>>>          final File fileToRenameTo = new File(originalLogFile.getPath()
>>>>>
>>>> +
>>>
>>>> ".deleted");
>>>>>          System.out.println("Trying to rename " + originalLogFile + "
>>>>>
>>>> to "
>>>
>>>> + fileToRenameTo);
>>>>>          final boolean renamedSucceeded = originalLogFile.renameTo(
>>>>> fileToRenameTo);
>>>>>          if (renamedSucceeded) {
>>>>>              System.out.println("Rename SUCCEEDED. Renamed file exists?
>>>>>
>>>> "
>>>
>>>> +
>>>>
>>>>> fileToRenameTo.exists());
>>>>>          } else {
>>>>>              System.out.println("FAILED to rename file " +
>>>>>
>>>> originalLogFile
>>>
>>>> + " to " + fileToRenameTo);
>>>>>          }
>>>>>          // wait for the user to check the lsof output, after our
>>>>> rename
>>>>> failed
>>>>>          System.out.println("Check the lsof output and press any key to
>>>>> close the open file channel to a deleted file");
>>>>>          System.in.read();
>>>>>          // close the file channel
>>>>>          fileChannel.close();
>>>>>          // let user check the lsof output one final time. This time he
>>>>> won't see open file resources from this program
>>>>>          System.out.println("File channel closed. Check the lsof output
>>>>>
>>>> and
>>>>
>>>>> press any key to terminate the program");
>>>>>          System.in.read();
>>>>>          // all done, exit
>>>>>          System.out.println("Program will terminate");
>>>>>      }
>>>>>
>>>>>
>>>>>
>>>>> [1] http://mail-archives.apache.org/mod_mbox/kafka-users/
>>>>> 201501.mbox/%3CCAA4R6b-7gSbPp5_ebGpwYyNibDAwE_%2BwoE%
>>>>> 2BKbiMuU27-2j%2BLkg%40mail.gmail.com%3E
>>>>> [2] http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8
>>>>>
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>> On Saturday 24 January 2015 11:12 PM, Jay Kreps wrote:
>>>>>
>>>>>  Hey guys,
>>>>>>
>>>>>> Jaikiran posted a patch on KAFKA-1853 to improve the handling of
>>>>>>
>>>>> failures
>>>>
>>>>> during delete.
>>>>>> https://issues.apache.org/jira/browse/KAFKA-1853
>>>>>>
>>>>>> The core problem here is that we are doing File.rename() as part of
>>>>>>
>>>>> the
>>>
>>>> delete sequence which returns false if the rename failed. Or file
>>>>>>
>>>>> delete
>>>
>>>> sequence is something like the following:
>>>>>> 1. Remove the file from the index so no new reads can begin on it
>>>>>> 2. Rename the file to xyz.deleted so that if we crash it will get
>>>>>>
>>>>> cleaned
>>>>
>>>>> up
>>>>>> 3. Schedule a task to delete the file in 30 seconds or so when any
>>>>>> in-progress reads have likely completed. The goal here is to avoid
>>>>>>
>>>>> errors
>>>>
>>>>> on in progress reads but also avoid locking on all reads.
>>>>>>
>>>>>> The question is what to do when rename fails? Previously if this
>>>>>>
>>>>> happened
>>>>
>>>>> we actually didn't pay attention and would fail to delete the file
>>>>>> entirely. This patch changes it so that if the rename fails we log an
>>>>>>
>>>>> error
>>>>
>>>>> and force an immediate delete.
>>>>>>
>>>>>> I think this is the right thing to do, but I guess the real question
>>>>>>
>>>>> is
>>>
>>>> why would rename fail? Some possibilities:
>>>>>> http://stackoverflow.com/questions/2372374/why-would-a-
>>>>>> file-rename-fail-in-java
>>>>>>
>>>>>> An alternative would be to treat this as a filesystem error and
>>>>>>
>>>>> shutdown
>>>
>>>> as we do elsewhere.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> -Jay
>>>>>>
>>>>>>
>>>>>>
>

Reply via email to